[ZODB-Dev] ZODB idioms
Tim Peters
tim@zope.com
Wed, 26 Jun 2002 01:10:37 -0400
[Toby Dickenson]
> If its grossly unbalanced then it has to load more tree nodes
> into memory from disk.... possibly a whole BTree.
>
> As you said, its only a problem if it actually happens.
The good news-- maybe --is that it's not easy to make an unbalanced tree.
If you try to provoke one with a black-box approach, you're almost certain
not to succeed. It *is* easy to provoke one with a white-box approach, like
"""
from BTrees.IIBTree import IITreeSet
def abuse():
t = IITreeSet()
ins = t.insert
depth = t.has_key
nuke = t.remove
# Insert keys until we get to three levels.
i = 0
while 1:
i += 1
ins(i)
if depth(i) == 3:
break
# Now the last key inserted will persist at
# that level if we nuke everything before it.
# (It so happens i is 60,001 now.)
for j in range(1, i):
nuke(j)
assert t.has_key(i)
print "Now the set has just", len(t), "element,",
print "and it's at depth", depth(i)
t._check()
abuse()
"""
More generally, you can use __getstate__ to examine the bucket structure,
and delete all keys *except* for those that are last in their bucket. Then
keep adding increasing keys until a new bucket appears, delete all but its
last key, etc etc. Keep this up and you can build a tree of arbitrary depth
with an arbitrarily high ratio of BTree nodes to keys. But the "arbitrary"
there is a theoretical tease, as the number of insertions required to
provoke a new level becomes astronomical very quickly.
Against that, rebalancing on deletion is a delicate and expensive process.
Indeed, it appears that a correct algorithm for rebalancing deletion in a B+
tree didn't get published before 1995(!):
Implementing Deletion in B+-trees
J. Jannink
http://dbpubs.stanford.edu:8090/pub/1995-19
That in turn kicked off a batch of papers trying to improve Jannink's
approach. It appears difficult to do this right -- and it would increase
conflicts if it were done at all.