[ZODB-Dev] Re: [Zope-dev] [BUG] maybe yet unknown BTREES bug

Tim Peters tim@zope.com
Sun, 26 Jan 2003 11:41:35 -0500


[Tim]
> That one's a stretch <wink>.  Mutating while iterating may give
> surprising results, but it shouldn't be possible to *damage* a
> BTree this way ...

[Christian Reis]
> Can you elaborate a bit on what "surprising results" might mean? I don't
> think we do that in IC but it never hurts to know.

Did you ever mutate a Python list while iterating over it?  Same thing,
really -- you can end up "skipping" list elements, or even seeing some
elements twice.  A difference is that mutating a Python list while iterating
over it is much more predictable than doing so with a BTree, as a BTree's
internal structure is way more complicated than a Python list's.

Just try something and see what happens:

>>> t = OOBTree()
>>> for i in range(50):
...     t[i] = i
>>>

At this point we've got (0, 0) thru (49, 49) in the tree.  Now iterate over
t.keys() and simultaneously try deleting and adding some new pairs:

>>> for k in t.keys():
...     print k,
...     del t[k]
...     t[100+k] = 100+k
...
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44
>>>

Every other key "got skipped" for some number of initial entries, and then
the tail end of the tree "got lost".  The tree content after may or may not
be what you expected, but if it *is* what you expected then you know an
awful lot about how BTrees are implemented <wink>:

>>> list(t.keys())
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35,
 37, 39, 41, 43, 45, 46, 47, 48, 49, 100, 102, 104, 106, 108, 110, 112,
 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140,
 142, 144]
>>>

The tree isn't damaged:

>>> t._check()   # internal pointer consistency checks don't complain
>>> from BTrees.check import check
>>> check(t)     # and neither do value-based consistency checks
>>>

The internal bucket structure explains the observed behavior, but the
internal bucket structure is difficult to predict, and even more difficult
to relate to the observed behavior:

>>> from BTrees.check import display
>>> display(t)
 OOBTree (0xecdfc0) with 4 children
    key 0: 15
    key 1: 30
    key 2: 45
    0 OOBucket (0xecd220) with 7 keys
        key 0: 1 value 1
        key 1: 3 value 3
        key 2: 5 value 5
        key 3: 7 value 7
        key 4: 9 value 9
        key 5: 11 value 11
        key 6: 13 value 13
    1 OOBucket (0xecd0a0) with 8 keys
        key 0: 15 value 15
        key 1: 17 value 17
        key 2: 19 value 19
        key 3: 21 value 21
        key 4: 23 value 23
        key 5: 25 value 25
        key 6: 27 value 27
        key 7: 29 value 29
    2 OOBucket (0xecdf70) with 7 keys
        key 0: 31 value 31
        key 1: 33 value 33
        key 2: 35 value 35
        key 3: 37 value 37
        key 4: 39 value 39
        key 5: 41 value 41
        key 6: 43 value 43
    3 OOBucket (0xecefb0) with 28 keys
        key 0: 45 value 45
        key 1: 46 value 46
        key 2: 47 value 47
        key 3: 48 value 48
        key 4: 49 value 49
        key 5: 100 value 100
        key 6: 102 value 102
        key 7: 104 value 104
        key 8: 106 value 106
        key 9: 108 value 108
        key 10: 110 value 110
        key 11: 112 value 112
        key 12: 114 value 114
        key 13: 116 value 116
        key 14: 118 value 118
        key 15: 120 value 120
        key 16: 122 value 122
        key 17: 124 value 124
        key 18: 126 value 126
        key 19: 128 value 128
        key 20: 130 value 130
        key 21: 132 value 132
        key 22: 134 value 134
        key 23: 136 value 136
        key 24: 138 value 138
        key 25: 140 value 140
        key 26: 142 value 142
        key 27: 144 value 144
>>>

It's even possible that the lazy t.keys() iterator object ends up with
pointers to buckets that no longer exist in the BTree, if you delete exactly
the right keys while iterating over t.keys().  That's difficult to provoke,
so it's left as an exercise for the reader.  You can end up "seeing" keys
that no longer exist in the BTree then, but again the BTree itself remains
sane (in current releases -- unlikely patterns of deletions could damage a
BTree before, and even if iteration wasn't involved).