[ZODB-Dev] ZODB idioms

Tim Peters tim@zope.com
Wed, 26 Jun 2002 01:10:37 -0400


[Toby Dickenson]
> If its grossly unbalanced then it has to load more tree nodes
> into memory from disk.... possibly a whole BTree.
>
> As you said, its only a problem if it actually happens.

The good news-- maybe --is that it's not easy to make an unbalanced tree.
If you try to provoke one with a black-box approach, you're almost certain
not to succeed.  It *is* easy to provoke one with a white-box approach, like

"""
from BTrees.IIBTree import IITreeSet

def abuse():
    t = IITreeSet()
    ins = t.insert
    depth = t.has_key
    nuke = t.remove

    # Insert keys until we get to three levels.
    i = 0
    while 1:
       i += 1
       ins(i)
       if depth(i) == 3:
           break

    # Now the last key inserted will persist at
    # that level if we nuke everything before it.
    # (It so happens i is 60,001 now.)
    for j in range(1, i):
       nuke(j)

    assert t.has_key(i)
    print "Now the set has just", len(t), "element,",
    print "and it's at depth", depth(i)
    t._check()

abuse()
"""

More generally, you can use __getstate__ to examine the bucket structure,
and delete all keys *except* for those that are last in their bucket.  Then
keep adding increasing keys until a new bucket appears, delete all but its
last key, etc etc.  Keep this up and you can build a tree of arbitrary depth
with an arbitrarily high ratio of BTree nodes to keys.  But the "arbitrary"
there is a theoretical tease, as the number of insertions required to
provoke a new level becomes astronomical very quickly.

Against that, rebalancing on deletion is a delicate and expensive process.
Indeed, it appears that a correct algorithm for rebalancing deletion in a B+
tree didn't get published before 1995(!):

    Implementing Deletion in B+-trees
    J. Jannink
    http://dbpubs.stanford.edu:8090/pub/1995-19

That in turn kicked off a batch of papers trying to improve Jannink's
approach.  It appears difficult to do this right -- and it would increase
conflicts if it were done at all.