[ZODB-Dev] ZODB idioms

Tim Peters tim@zope.com
Mon, 24 Jun 2002 11:51:08 -0400


[Jeremy Hylton]
> BTrees do not rebalance themselves.

It's not that simple.  While Zope's flavor of BTree doesn't try to rebalance
after deletions, they're perfectly height-balanced so long as you only
insert:

>>> from BTrees.IIBTree import *
>>> t = IITreeSet()
>>> t.update(xrange(1000000))
1000000
>>> t.has_key(0)
3
>>> t.has_key(999999)
3
>>>

has_key() actually returns the depth of the bucket in which a key is found.
All the buckets in this million-element BTree set are at depth 3, and it
always takes exactly 3 relatively small binary searches to determine whether
a key is present.

Under a mixture of inserts *and* deletes, this flavor of BTree can grow
height-unbalanced (although a "classic BTree" cannot), because this flavor
doesn't do anything to try to maintain height balance under deletions.

> I suspect that the use of random ids in Zope is to avoid balancing
> problems by random insertions.

If so, it's trying to solve a non-problem <wink>.  Perhaps worse, my current
attempts to speed intersections could exploit regularity in ids, due to
"clumping" in real life.

> In the case of sequential ids, each BTree bucket will be half full.

True.

> When the bucket reaches its limit, it will be split into two buckets of
> equal size (call them left and right).  Since the ids are sequential
> the left bucket will never grow and all the new ids will be put in the
> right bucket.

True but largely irrelevant:  the right bucket will split too when it gets
too big.  The bucket depths remain equal.  I think far too big a deal is
made out of that buckets may be *filled* to only 50% of maximum capacity.
Indeed, Zope's BTrees have relatively huge BTree node capacities, and
filling them to the max costs on the database end (any change to a node
requires replacing the entire node).

> ...
> The period of Python's random number generator is slightly more than
> 6.9 * 10 ** 12.
> (http://www.python.org/doc/current/lib/module-random.html)
> I think reseeding will do more harm than good.

Me too.

[Toby Dickenson]
> Eeeeek. There are a number of cases in Zope where I suspect this could be
> catastrophic.
>
> For example, ZCatalog creates a standard index for every object's last
> modified time. The reverse index OIBTree is keyed on last
> modified time, and is therefore likely to be grossly unbalanced.

So long as there are no deletions, it will be perfectly balanced.  I don't
know how unbalanced things get in real life in the presence of deletions
too, but am working in the background on a tool that will be able to tell
us.

[Casey Duncan]
> Hmmm, I thought the B stood for "Balanced". Isn't that false
> advertising? ;^) Unless of course it stands for binary...

Believe it or not, the B-Tree inventors refused to say what *they* thought
the "B" stood for.  Their flavor of BTree did remain balanced under
deletions, though.