[ZODB-Dev] ZODB idioms

Casey Duncan casey@zope.com
25 Jun 2002 09:14:54 -0400


Tim,

I'm glad you're hear to explain this to all of us. It would seem that I
can stop worrying now about the trees becoming severely unbalanced. Its
good to know that it can happen with lots of deletes though.

I look forward to some way to assess existing BTrees that have been
around a while to see how (un)balanced they really are.

Given the number of tiny BTrees that Jeremy found, would it make any
sense to optimize very small ones? Where might the cutoff be where a
simple dictionary is possibly a better choice (persistance not
withstanding)?

-Casey

On Tue, 2002-06-25 at 02:24, Tim Peters wrote:
> [Casey Duncan]
> > An interesting thought might be to attempt to rebalance BTrees to
> > "ideal" at pack time. At least that would give you some chance to
> > rectify unfortunate insert ordering.
> 
> I don't know that there's a real problem here, but that's a fine idea if
> there is!  Will keep it in mind.
> 
> In the last Data.fs Jeremy examined in detail, all the BTrees were so small
> that they contained exactly one BTree node and some buckets, and that's all.
> IOW, the only BTrees simpler than that are empty ones, and the biggest one
> was a factor of 70 smaller than needed to hit the next level of complexity.
> 
> > Another interesting thought would be a BTree visualization tool which
> > created a graph from an existing BTree. or at least a method that showed
> > some measure of balance so that we could test and see how adverse
> > unbalanced trees really are.
> 
> The branching factor on Zope BTrees is so large that you'd need to display
> up to 1000 children coming out of the top-level node.  A better measure of
> balance is the min, max and average # of comparisons needed to find a key in
> the tree.  I'll write an efficient method to compute that, provided I can
> make time for it (BTW, note that the avg # of comparisons is mostly
> independent of bucket fill ratio:  if bucket are only half full, it saves
> one step of binary search at the bucket level but adds one at the level
> above it (which has twice as many buckets to choose from) -- it's a wash --
> and this is another reason to suspect that worrying about fill ratio is
> misguided; it was important in *classic* BTrees because all nodes carried
> exactly the same memory and disk burdens, but ours don't).
> 
> > I mean, what does a BTree look like if I insert 1000 elements in order?
> > I think you described it, but it would be nice to see a pretty picture
> > too...
> 
> If it's an II tree, there's one node at the top level, containing 16
> pointers to children, each of which is an II bucket.  Each child except the
> last contains 60 keys.  The last child contains 100.  That's it.  The
> children have no children of their own.  You have to get 60,001 keys before
> a second child level appears, and you need 15,015,061 keys before a third
> child level appears.  AFAIK that's never happened in real life (although it
> has in my test runs).  The stats are different and smaller for IO, OI, and
> OO flavors, but are qualitatively the same (it's the same code, just some
> #define'd constants vary).
>