[ZODB-Dev] IIBTree.multiunion and list comprehensions

Casey Duncan casey at zope.com
Wed Dec 10 09:27:51 EST 2003


On Wed, 10 Dec 2003 09:46:04 -0200
Christian Robottom Reis <kiko at async.com.br> wrote:

> On Tue, Dec 09, 2003 at 11:37:03PM -0500, Casey Duncan wrote:
> > >     multiunion(tree.keys())
> > 
> > The *keys* of the tree are sets? Huh?
> 
> You *know* I meant tree.values() <wink>

8^)
 
> > > takes about the same time to run as
> > > 
> > >     [value for set in tree.keys() for value in set] (*)
> > 
> > So you create a set out of this get the unique members?
> 
> Actually, all I want is a sequence of values -- it could be a set or a
> list.
> 
> What I'm doing is collating all the values in all the sets in my BTree.
> This BTree holds as values IITreeSets containing a number of integer
> OIDs. When I want to pick up *all* the values in the BTree, I need to
> join all these IITreeSets together and produce one big set. The fact
> that it's unique doesn't really matter in my case given that any given
> OID appears only once [as a value].

In that case it may be cheapest for you not to do the union up front at all. How about creating a lazy iterator that just walks down every member of every set in the tree? The big downside of that is that you can't really do anything with the result set besides iterating it, such as sorting or intersecting it.
 
> > > Note that multiunion returns an IISet(), while the latter returns a
> > > list; converting between is pretty cheap given the context.
> > 
> > Note that creating IITreeSets is often fastest because it allocates
> > new buckets as it grows rather than resizing a single memory block. An
> > IISet is essentially just a sorted C array of unique ints.
> 
> IIRC, IITreeSets also have the advantage of "unghosting as you go",
> whereas IISets are unpersisted completely when accessing one of its
> items. I'm wondering why multiunion creates IISets instead of
> IITreeSets, though.

Probably because they are a lot simpler, and they are intended for in memory use. Its also possible (although I don't know if its implemented this way) for multiunion to know the maximum possible size up front. In that case the memory for the set's array could all be allocated once. In fact, the C code could just work with an int array and then make it into an IISet at end. Plus searches and insertions are simpler operations on an IISet.

-Casey



More information about the ZODB-Dev mailing list