[ZODB-Dev] [BTrees] considerable optimization potential in"intersection"

Tim Peters tim at zope.com
Thu Aug 5 00:47:16 EDT 2004


[Andreas Jung]
> The vastly different sizes is the key point for this optimization.  In
> this case I have the knowledge on the application level that the first
> BTree will never be larger than several thousand items compared to
> several hundred thousand items in the seconds BTree.  Making this
> optimization decision on the BTree level might not work since you can not
> determine the size of a BTree without reading all keys which is totally
> inefficient

That's correct.  A BTree has no idea how big it is, and no way to find out
for sure short of loading every bucket in the tree.  You can get a (possibly
very) poor guess by scanning some small number of parent nodes "top down",
and making lots of dubious "average case" assumptions.

The algorithms in the paper I referenced generally use strategies that
really don't *need* to know the sizes in advance, although the paper doesn't
consider that case (I did).

> (btw. do btrees not implement __len__ through a counter?).

I'm afraid I'd have to speak German to parse that question <wink>.

__len__ is implemented by loading every bucket in the tree, and adding
together the size of each bucket.

A global "number of items in the whole tree" member would need to be updated
on every delete and insert, and conflicts would be rampant as a result.  The
way things are, multiple transactions mucking with distinct buckets
generally don't interfere with each other at all.

You're aware of Length.py, right?  That supplies a persistent counter class
with appropriate conflict resolution (never fails).  You have to keep that
in synch with a BTree yourself, telling the Length object you associate with
a given BTree how many items you're adding to, or deleting from, the tree.
That's one way to get the length cheaply, but at some effort, and in a way
that's easy to get out of synch with the true current size.

With enough effort, I think that could get folded into the BTree
implementation.  Nobody up my mgmt chain has suggested that's a priority,
though <wink>.



More information about the ZODB-Dev mailing list