[ZODB-Dev] n-way join algoritms

Roché Compaan roche at upfrontsystems.co.za
Sun Jun 22 20:18:39 EDT 2003


* Steve Alexander <steve at cat-box.net> [2003-06-22 17:01]:
> Roché Compaan wrote:
> >* Phillip J. Eby <pje at telecommunity.com> [2003-06-22 00:13]:
> >
> >>I don't understand what you're saying.  But, the basic merge algorithm 
> >>for joining is trivial.  All of the complexity of joining comes from 
> >>optimization.
> >>
> >>The most trivial algorithm would look like this:
> >>
> >>for a in A:
> >>   for b in B:
> >>       if a.x == b.x:
> >>           for c in C:
> >>               if b.y==c.y:
> >>                   for d in D:
> >>                       if c.z==d.z:
> >>                           yield a,b,c,d
> >
> >
> >Thanks Phillip this will work perfectly. I already have a filtered
> >result per class and there are indexes for all the joins available so it
> >should be quite fast.
> 
> You might also want to look at the operations described in IMerge of the 
> BTrees package.

Thanks Steve, I already use "intersection" and "union" extensively.

Let me elaborate a little for those who are interested. In one case I
query on three metatypes: Customer, ContactPerson and License. Customers
and Contacts are associated with each other in a CustomerContacts
association and Customers and Licenses are associated in a
LicensedCustomer association. Associations use a design similar to the
Catalog. It uses IOBTrees to map record ids to record ids. The
difference is that an Association has two ends where each end maintains
a mapping of record ids for instances of classes for that end, to record
ids for instances of classes for the other end. 

There was a thread about associations/relationships about 2 months ago
on this list. We prototyped several approaches and some turned out to
work really well. What is super about having associations in a ZODB app
is that you don't use RDBMS semantics to do something object oriented.
In our older apps we used to have lots of OtherClassID attributes on
classes representing the relationship between objects eg. a
ContactPerson would have a CustomerID. Now ContactPerson has a Customer
reference which is accessible and directly manipulated as an attribute,
ContactPerson.Customer. Since references are bidirectional Customers can
access their contacts through its Customer.ContactPersons reference.
References are ComputedAttributes.

With associations in place we needed to work out how to query the
catalog and associations easily, so we built a unified query service.
One can query the service in the same way one queries the catalog
eg. UnifiedQueryService(
        meta_type='ContactPerson', 
        Name='Roché Compaan',
        Customer=<Customer1 at ..>
    )
The query service figures out which keys are properties and which are
references and then delegates queries on properties to the catalog and
queries on references to the appropriate association. It returns an
intersection of the catalog and association results. Associations share
record ids with the Catalog so we don't have to make expensive getObject
calls to make the intersection.

The next step was application specific but can also turn out to be
generally useful. We needed to join results for different meta_types
returned by the unified query service. I was working on this over the
weekend and that's why I was curious about n-way join algorithms. So far
it looks very promising and I am very excited to have the ability to do
complex joins on objects in the ZODB. Having infrastructure like
associations seems to makes this very possible and very fast. We hope to
release our ZAssociations product soon :-)

-- 
Roché Compaan
Upfront Systems                 http://www.upfrontsystems.co.za



More information about the ZODB-Dev mailing list