[ZODB-Dev] IIBTree.multiunion and list comprehensions

Casey Duncan casey at zope.com
Wed Dec 10 10:12:55 EST 2003


On Wed, 10 Dec 2003 12:52:37 -0200
Christian Robottom Reis <kiko at async.com.br> wrote:

> On Wed, Dec 10, 2003 at 09:27:51AM -0500, Casey Duncan wrote:
> > > What I'm doing is collating all the values in all the sets in my BTree.
> > > This BTree holds as values IITreeSets containing a number of integer
> > > OIDs. When I want to pick up *all* the values in the BTree, I need to
> > > join all these IITreeSets together and produce one big set. The fact
> > > that it's unique doesn't really matter in my case given that any given
> > > OID appears only once [as a value].
> > 
> > In that case it may be cheapest for you not to do the union up front
> > at all. How about creating a lazy iterator that just walks down every
> > member of every set in the tree? The big downside of that is that you
> > can't really do anything with the result set besides iterating it,
> > such as sorting or intersecting it.
> 
> This is probably a good idea (though it requires Python 2.2, which we
> haven't yet in IndexedCatalog). You last phrase confused me, however: do
> you mean I *can* sort and intersect the result set? If so, it would be
> neat, since what I want to do in many cases is intersect this with
> another set (resulting from a boolean query).

Using the new iterator protocol would require 2.2, but you can also use the old one (__getitem__) like ZCatalog's lazy sequences do for python <= 2.1.

No, you can't really intersect without making it into a full set.
 
> My main worry with *that*, however, is that intersecting will end up
> having a similar cost to doing multiunion, since the cost I'm concerned
> about is the first-run cost (IOW, the cost of unpersisting the actual
> sets) and AFAIHS multiunion is too fast to notice once the sets are
> in-memory.

Yes, if you walk the whole iterator, then you pay the price. The main benefit of the iterator is in cases where you are really only going to use/display part of the result. 
 
> I realize that first-time costs aren't really important for long-running
> web applications; however, we're running a desktop system, and having
> the initial operation execute slowly doesn't give a nice `first
> impression', if you believe in such things.

Is there any way to defer this to later? Many applications (especially network bound ones like mail apps) use multiple threads so that the application can start fast and you can interact with it while it does the long process of loading lots of other stuff.

The other option is to simply load less stuff. Is there any way to do this whole multiunion operation ahead of time and persist that (smaller) set? How many users share this database?

-Casey



More information about the ZODB-Dev mailing list