[Zope3-dev] Yet Another Relations (aka Reference) Engine...

Fri Nov 11 12:00:57 EST 2005

Helmut Merz wrote:

>Am Freitag, 11. November 2005 16:11 schrieb Jean-Marc Orliaguet:
>
>  
>
>>Hi Helmut!
>>    
>>
>
>Hi Jean-Marc,
>
>thanks for your remarks,
>
>just before going into more detail: My primary concern was the 
>API - it would really fine if there could be a simple (as simple 
>as possible but not simpler) standard set of (low-level) 
>interfaces on which to build (defining semantically richer 
>interfaces) and for which to provide implementations (depending 
>on the needs of the application).
>
>The implementation with the catalog should serve as a (again 
>fairly simple but working) example and a proof of concept; I 
>think I would be just lucky if it would show up as really 
>useful ;-) (but maybe...)
>
>  
>

Hi,

a common interface could be useful indeed.

>>I can tell you about the design decisions made in the case of
>>the relation tool included in CPSSkins. They don't necessarily
>>appear in the code itself in an obvious way.
>>
>>1) separate storage from storage policy
>>
>>the relation storage stores what it is told to store, as long
>>as the objects are Relatable they can be stored. The storage
>>policy (using unique ids or not, etc..) is the responsibility
>>of the application itself. To impose a unique id policy when
>>storing elements would be a mistake in my opinion (in the case
>>of cpsskins it wouln't work either).
>>    
>>
>
>The only prerequisite for using the IntIds utility is that the 
>objects are persistent (provide IPersistent). If one wants to 
>relate objects that are not persistent or have relations that 
>for some reason can't be persistent you can't use the catalog 
>approach because the catalog depends on IntIds.
>
>So the catalog-based implementation won't be usable for relations 
>between in-memory objects (like views, adapters and related 
>stuff), that's true.
>  
>

I was thinking more about the policy of assigning unique ids to objects
in a relation. It's the application that really should decide about that
policy.

>>2) keep the relation storage index as small as possible.
>>
>>Do not index predicates, the same predicates are used in too
>>many relations, the size of the index ould just increase
>>dramatically. Instead only index the elements that are inside
>>the relation, the chances that the same elements are related
>>in many different ways are very low.
>>
>>    cf.
>>
>>    
>>
>http://www.z3lab.org/sections/blogs/jean-marc-orliaguet/2005_08_27_triadic-relations/
>  
>
>>http://svn.nuxeo.org/trac/pub/file/z3lab/cpsskins/branches/jmo
>>-perspectives/storage/relations.py
>>    
>>
>
>I read this, and it indeed gave me the impression that it might 
>be a not so bad idea to use a catalog ;-)
>
>  
>

well, you haven't written the catalog indexes yet :-)

And Lennart wrote a piece about the kinds of problems you'll run into if
you don't optimize them for relations. You'll end up with intersections
of huge sets:
http://blogs.nuxeo.com/sections/blogs/lennart_regebro/2005_08_29_indexing-events

>>    I don't know about using the zc.catalog for indexing
>>relations, you could end up in huge indexes and very slow
>>queries.
>>    
>>
>
>This is one of my concerns, too, but I'm fairly optimistic: the 
>catalog indexes store a common string to be indexed only once, 
>so having identical ; I'm working with the Archetypes reference 
>engine (that uses - at least in this respect - the same kind of 
>catalog indexes) in situations with many thousands of objects 
>and didn't get problems of this kind.
>
>  
>
By looking at the code, Archetypes does not store relations, it stores
'references' (and backward references) which consist in a target object
and a predicate ('relationship') in the objects themselves . I guess
that objects are indexed in the catalog. So the relation is stored
implicitly but there is no explicit relation object to start with. So
the model is a bit different I guess.

>>3) don't make the API for querying the storage be too
>>intelligent,
>>    
>>
>
>The query() method using the catalog's searchResults() / apply() 
>methods was the dumbest one I could thing of ;-)
>
>  
>
>> to create complex queries, create complex predicates instead,
>>i.e.
>>
>>   - predicates that combine several predicates
>>   - proxy predicates (when the predicate is evaluated at
>>runtime and a method is specifed instead)
>>
>>    cf
>>
>>    
>>
>http://svn.nuxeo.org/trac/pub/file/z3lab/cpsskins/branches/jmo-perspectives/relations/__init__.py
>  
>
>>    if you need to do really complex queries, do several
>>queries and filter out the results afterwards  in you
>>application unless you're fine with ending up with a huge
>>catalog index.
>>    
>>
>
>To be honest, I never thought about complex queries as I just 
>want to find e.g. the subtasks of a task and the resources 
>allocated to it - maybe my use case is just somewhat simple.
>
>OTOH: An advantage of using a catalog are the - as I think - 
>fairly efficient set operations on search results for the 
>indexes...
>
>Helmut
>  
>
This opens the door to a combinational explosion. The number of
relations between objects literally explodes unless you carefully choose
the relation predicates. The catalog won't help unless you have very
carefully designed indexes I guess.

/JM