[ZODB-Dev] ZODB idioms

Tim Peters tim@zope.com
Tue, 25 Jun 2002 02:24:57 -0400


[Casey Duncan]
> An interesting thought might be to attempt to rebalance BTrees to
> "ideal" at pack time. At least that would give you some chance to
> rectify unfortunate insert ordering.

I don't know that there's a real problem here, but that's a fine idea if
there is!  Will keep it in mind.

In the last Data.fs Jeremy examined in detail, all the BTrees were so small
that they contained exactly one BTree node and some buckets, and that's all.
IOW, the only BTrees simpler than that are empty ones, and the biggest one
was a factor of 70 smaller than needed to hit the next level of complexity.

> Another interesting thought would be a BTree visualization tool which
> created a graph from an existing BTree. or at least a method that showed
> some measure of balance so that we could test and see how adverse
> unbalanced trees really are.

The branching factor on Zope BTrees is so large that you'd need to display
up to 1000 children coming out of the top-level node.  A better measure of
balance is the min, max and average # of comparisons needed to find a key in
the tree.  I'll write an efficient method to compute that, provided I can
make time for it (BTW, note that the avg # of comparisons is mostly
independent of bucket fill ratio:  if bucket are only half full, it saves
one step of binary search at the bucket level but adds one at the level
above it (which has twice as many buckets to choose from) -- it's a wash --
and this is another reason to suspect that worrying about fill ratio is
misguided; it was important in *classic* BTrees because all nodes carried
exactly the same memory and disk burdens, but ours don't).

> I mean, what does a BTree look like if I insert 1000 elements in order?
> I think you described it, but it would be nice to see a pretty picture
> too...

If it's an II tree, there's one node at the top level, containing 16
pointers to children, each of which is an II bucket.  Each child except the
last contains 60 keys.  The last child contains 100.  That's it.  The
children have no children of their own.  You have to get 60,001 keys before
a second child level appears, and you need 15,015,061 keys before a third
child level appears.  AFAIK that's never happened in real life (although it
has in my test runs).  The stats are different and smaller for IO, OI, and
OO flavors, but are qualitatively the same (it's the same code, just some
#define'd constants vary).