[ZODB-Dev] n-way join algoritms

Phillip J. Eby pje at telecommunity.com
Sat Jun 21 19:11:24 EDT 2003


At 09:18 PM 6/21/03 +0200, Roché Compaan wrote:
>Does anybody now of an example n-way join algorithm in python or a
>resource that may get me going. I've googled for them but mainly ran
>into very technical papers with lots a relational algebra which I am not
>familiar with.

Did you look at Aaron Watters' paper on Gadfly 2?


>I am building a query service for a Zope app that uses the catalog for
>simple queries on properties and an associations repository for
>associations between objects.
>
>In the absence of a good algorithm I made one that goes like this:
>
>Say I query on metatypes a,b,c and d. The result should be joined with
>the condition: a.x == b.x and b.y == c.y and c.z == d.z. What I do at
>the moment is to process conditions from left to right and then from
>right to left and I seem to be getting the right result:
>
>left to right:
>     a.x == b.x: filter a and b
>     b.y == c.y: filter b and c
>     c.z == d.z: filter c and d
>right to left:
>     b.y == c.y: filter b and c
>     a.x == b.x: filter a and b

I don't understand what you're saying.  But, the basic merge algorithm for 
joining is trivial.  All of the complexity of joining comes from optimization.

The most trivial algorithm would look like this:

for a in A:
     for b in B:
         if a.x == b.x:
             for c in C:
                 if b.y==c.y:
                     for d in D:
                         if c.z==d.z:
                             yield a,b,c,d

To speed this up a bit, wherever there is an index available, you can 
replace the 'for' loop with something that loops only through the items 
with a match from the preceding outer loop.

Further optimization is then much trickier.  If you have additional search 
arguments ("sargs"), you can try to predict which of A, B, C, and D would 
have the fewest matching items, and then you would use that set for the 
outermost loop.  (Notice that from a correctness POV, it doesn't matter 
what order the loops are nested, although the loops don't always look as 
simple when you change the order.)  Likewise, if some items have indexes 
and others don't, you may want to move the indexed loops inward, since they 
will go faster than unindexed searches.

Past this point, things then start to get really interesting, and you 
should probably just use a database that already has this stuff built in.




More information about the ZODB-Dev mailing list