[ZODB-Dev] Re: BTrees q [Fwd: [Zope-dev] More Transience weirdness in 2.7.1b1]

Wed Jun 2 22:21:10 EDT 2004

[Casey Duncan]
> This error occurs if the current offset in the current bucket in
> BTreeItems_seek is either negative or greater than the largest index in
> the bucket. The code assumes that this can only happen if the bucket was
> mutated between calls to BTreeItems_seek.

It's more of a required invariant for sane operation, and tons of code
cooperates in ensuring its truth, including that the BTree code never
creates a bucket with length 0 (well, not that you can see:  it unlinks any
empty buckets that may occur before it returns).  So it's something I'd like
to assert instead.  Alas, user code can break the invariant, so it has to be
checked.  This check wasn't always there.  It was introduced after Steve
Alexander provoked segfaults with code like this:

    thekeys = somebtree.keys()
    while True:
        del somebtree[thekeys[0]]

That leaves "currentoffset" and "pseudoindex" at 0 the whole time, but
eventually empties the first bucket entirely, leaving a currentoffset of 0
pointing at a then-deallocated key.  testDamagedIterator() was added when
this was fixed (== when the currentoffset check was added), to ensure that
such code raises the "changed size" exception instead.

> Since you are just iterating it, no such change could be occurring
> (unless the reference is shared between threads somehow),

It's tempting to believe it's even worse than that:  the Python GIL is held
for the duration of the list() call, so it "should be" that no other
Python-created thread *can* be running in the same process until the list()
call completes.  Alas, ZODB's C-level ThreadLock objects release the GIL
when you try to acquire one, and I suppose it's possible that persistent
loads needed to suck up buckets end up doing that in some convoluted way.

> so I suspect foul play.

Or Chris redefined list() to "lambda x: x" <wink>.

> It might be iteresting to instrument the BTreeItemsTemplate.c in the
> BTreeItems_seek() function to spew the values of i, currentoffset and
> currentbucket->len when it blows up. Then you could see whether it dies
> at the beginning, end or somewhere in the middle and what bucket and
> index it is on when it does.

Yup.  Would also be good to get a debugger stack trace at this point.  I'm
assuming the exception occurs during the list() call because nothing else
makes sense; however, it doesn't make sense that the exception occurs during
the list() call either, so all assumptions are suspect.

> ...
> Interesting that restarting Zope cured this error. There must be some
> inconsistency in the cached state then.

Note that the consistency check here is within a single bucket, i.e. within
a single persistent object.  Almost all potential problems with BTrees have
to do with the *mutual* consistency of multiple bucket and parent nodes, in
how they relate to each other.  But this check is purely local.  Even if the
BTree is insane as a whole, a single bucket within it can't get inconsistent
with itself.

Now if the cache had a bucket of, say, length 20 when list() began, and
*while* the list() call was iterating over the bucket its guts magically got
replaced by a bucket of, say, length 10, then we could get this exception.
ZODB doesn't act on invalidations until a transaction boundary, though.