[ZODB-Dev] Re: [Zope-dev] [BUG] maybe yet unknown BTREES bug
Tim Peters
tim@zope.com
Sun, 26 Jan 2003 11:41:35 -0500
[Tim]
> That one's a stretch <wink>. Mutating while iterating may give
> surprising results, but it shouldn't be possible to *damage* a
> BTree this way ...
[Christian Reis]
> Can you elaborate a bit on what "surprising results" might mean? I don't
> think we do that in IC but it never hurts to know.
Did you ever mutate a Python list while iterating over it? Same thing,
really -- you can end up "skipping" list elements, or even seeing some
elements twice. A difference is that mutating a Python list while iterating
over it is much more predictable than doing so with a BTree, as a BTree's
internal structure is way more complicated than a Python list's.
Just try something and see what happens:
>>> t = OOBTree()
>>> for i in range(50):
... t[i] = i
>>>
At this point we've got (0, 0) thru (49, 49) in the tree. Now iterate over
t.keys() and simultaneously try deleting and adding some new pairs:
>>> for k in t.keys():
... print k,
... del t[k]
... t[100+k] = 100+k
...
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44
>>>
Every other key "got skipped" for some number of initial entries, and then
the tail end of the tree "got lost". The tree content after may or may not
be what you expected, but if it *is* what you expected then you know an
awful lot about how BTrees are implemented <wink>:
>>> list(t.keys())
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35,
37, 39, 41, 43, 45, 46, 47, 48, 49, 100, 102, 104, 106, 108, 110, 112,
114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140,
142, 144]
>>>
The tree isn't damaged:
>>> t._check() # internal pointer consistency checks don't complain
>>> from BTrees.check import check
>>> check(t) # and neither do value-based consistency checks
>>>
The internal bucket structure explains the observed behavior, but the
internal bucket structure is difficult to predict, and even more difficult
to relate to the observed behavior:
>>> from BTrees.check import display
>>> display(t)
OOBTree (0xecdfc0) with 4 children
key 0: 15
key 1: 30
key 2: 45
0 OOBucket (0xecd220) with 7 keys
key 0: 1 value 1
key 1: 3 value 3
key 2: 5 value 5
key 3: 7 value 7
key 4: 9 value 9
key 5: 11 value 11
key 6: 13 value 13
1 OOBucket (0xecd0a0) with 8 keys
key 0: 15 value 15
key 1: 17 value 17
key 2: 19 value 19
key 3: 21 value 21
key 4: 23 value 23
key 5: 25 value 25
key 6: 27 value 27
key 7: 29 value 29
2 OOBucket (0xecdf70) with 7 keys
key 0: 31 value 31
key 1: 33 value 33
key 2: 35 value 35
key 3: 37 value 37
key 4: 39 value 39
key 5: 41 value 41
key 6: 43 value 43
3 OOBucket (0xecefb0) with 28 keys
key 0: 45 value 45
key 1: 46 value 46
key 2: 47 value 47
key 3: 48 value 48
key 4: 49 value 49
key 5: 100 value 100
key 6: 102 value 102
key 7: 104 value 104
key 8: 106 value 106
key 9: 108 value 108
key 10: 110 value 110
key 11: 112 value 112
key 12: 114 value 114
key 13: 116 value 116
key 14: 118 value 118
key 15: 120 value 120
key 16: 122 value 122
key 17: 124 value 124
key 18: 126 value 126
key 19: 128 value 128
key 20: 130 value 130
key 21: 132 value 132
key 22: 134 value 134
key 23: 136 value 136
key 24: 138 value 138
key 25: 140 value 140
key 26: 142 value 142
key 27: 144 value 144
>>>
It's even possible that the lazy t.keys() iterator object ends up with
pointers to buckets that no longer exist in the BTree, if you delete exactly
the right keys while iterating over t.keys(). That's difficult to provoke,
so it's left as an exercise for the reader. You can end up "seeing" keys
that no longer exist in the BTree then, but again the BTree itself remains
sane (in current releases -- unlikely patterns of deletions could damage a
BTree before, and even if iteration wasn't involved).