[ZODB-Dev] Re: BTrees q [Fwd: [Zope-dev] More Transienceweirdness in 2.7.1b1]

Thu Jun 3 03:18:46 EDT 2004

[Chris McDonough]
> Chris asks a lot of questions below and has few answers.

Don't fret, I'll make stuff up <0.9 wink>.

[Tim Peters]
...
>> This check wasn't always there.  It was
>> introduced after Steve Alexander provoked segfaults with code like this:
>>
>>     thekeys = somebtree.keys()
>>     while True:
>>         del somebtree[thekeys[0]]
>>
>> ... testDamagedIterator() was added
>> when this was fixed (== when the currentoffset check was added), to
>> ensure that such code raises the "changed size" exception instead.

> Would it make sense to change the exception raised there in the mainline
> code to include the values of currentoffset, currentbucket->len, delta,
> and pseudoindex?  I can't imagine that this code gets hit very often, and
> having those values in the actual exception message would make it
> possible to get some insight into the bucket state without requiring
> someone to run a debug version of BTrees to capture the info.

Feel free to add stuff.  I don't expect it will help much, but who knows.
There have been only 3 ways this code has been hit:  people deliberately
trying to provoke segfaults (that's what Steve was doing at the time);
people mutating while iterating (and then it's usually obvious); and this
unique report of it happening "for no reason at all".  The extra info
probably wouldn't be of any use except in the last of those cases; can't
tell whether it would help in that case either, until it's tried.

>> ... Alas, ZODB's C-level ThreadLock objects release the GIL when
>> you try to acquire one,

> Not that I have any clue about this stuff, but do you know why
> ThreadLocks do this?

They have to.  A ThreadLock contains a real lock, and part of acquiring a
ThreadLock is acquiring the real lock it contains.  If the latter already
happens to be acquired by some other thread, and the C ThreadLock
implementation didn't release the GIL before trying to acquire it for
itself, then the whole system would deadlock (the non-ThreadLock threads
couldn't get any Python cycles because the ThreadLock thread would be
holding the GIL, and the ThreadLock thread would wait forever for a
non-ThreadLock thread to release the real lock it's trying to acquire).

> Do Python RLocks do this?

Not directly, but same effect for a different reason:  RLocks are coded in
Python, so other threads can sneak in between any pair of bytecodes.

...

> ... It's possible that at some point before this
> code ran some freak did "__builtins__['list'] = my_cooler_list_func", but
> somehow I feel silly asking Steve to make sure of it; I guess I should.
> ;-)

Ya, it's extremely unlikely.  Then again, this bug is impossible <wink>.

> (FWIW, Tres has threatened to do this for hasattr given that it
> silently swallows all exceptions!  We'll see if he pulls the trigger ;-)

I think that's a good idea for Zope!  I'm sure that when Guido designed the
attribute introspection machinery, he never dreamed that merely asking
whether an attribute existed could end up sucking megabytes out of a
database halfway around the world, creating a thousand new objects as a side
effect, and potentially causing catastrophic problems.  Zope and ZODB do
incredibly complex stuff as side effects in what Guido surely thought of as
"tiny hooks".  He had in mind that hasattr() might look for a key in a dict
or list, and suppress a harmless KeyError or IndexError, not start World War
III and then send an endless sequence of Terminators back in time to change
the outcome <wink>.

> ...
> You wouldn't be able to tell me how to get a stack trace out of this code
> in the circumstance that the exception is raised, would you?  Is this
> something that can be put into the code itself and printed conditionally
> or does it need to happen via gdb interactively?

If I were able to provoke this, I wouldn't raise the exception at all -- I'd
change the code to, say, dereference a NULL pointer at this point, so I'd
get a nice core dump file to crawl over in gdb for a month.

...

> Could the

Almost certainly it

...

> It makes me want to think that the only thing that can be happening here
> is memory corruption of some kind (whether due to hw failure or compiler
> wrongness).

Or a wild store in C code.  ZODB, Zope and Python have all been known to do
that.  If the OP is running any C extensions, they're suspect too.

> Steve has said that the machine runs fine though; it's apparently old
> but trustworthy.

I worked for CPU manufacturers in two previous lives, and there's no such
thing as trustworthy HW at the prices we pay.  Sad to say, they're still
more reliable than SW, though.

> If it were hardware I'd suspect it would be flaking out otherwise too.
> I've asked him to try to run the code on a different box.

Good sanity check!

>> Now if the cache had a bucket of, say, length 20 when list() began, and
>> *while* the list() call was iterating over the bucket its guts magically
>> got replaced by a bucket of, say, length 10, then we could get this
>> exception. ZODB doesn't act on invalidations until a transaction
>> boundary, though.

> Unless a ReadConflictError is raised, right?  In that case, the object is
> put into the invalid objects list right away.

By "act on", I really meant "fetch fresh data".  If this is the theory, we
have to account for the guts of a single object getting replaced, "in the
middle" of a transaction.  If we already loaded the bucket before an
invalidation appears, ZODB sticks the oid in the set of invalidated objects,
but doesn't refetch the data.  Hanging on to the data it already has is the
heart of supplying a consistent database view in the absence of MVCC.  If we
got an invalidation before loading the bucket, then trying to load the
bucket raises ReadConflictError.  In neither case does the state get
*replaced* mid-transaction.  The only way state is supposed to change is via
loading for the first time in a transaction, or via user code mutating it.

> Is it possible that the list() implementation (or the BTreesItem_seek
> code itself, or the other BTrees code that calls into it) could swallow a
> conflict error (maybe it calls into hasattr! ;-)?

By eyeball, no.  There are no hooks in the implementation of BTrees to
replace the C code you see there, and a BTreeItems object really contains
little except handles into the start and end of the range it represents.
There's nothing there except a bunch of C pointers linking the bucket chain.
Since it's an IOBTree and we're iterating over the keys, there's no reason
to look at the BTree's *values* at all, and C operations on integer keys are
(a) all in C, and (b) unexceptional.  The list() implementation doesn't hide
exceptions regardless, except for IndexError (which is the way old-style
iteration communicates that it's done) and StopIteration (the new-style
way).

> Might that provoke insanity in the index state as it attempts to
> walk over the set of buckets that make up the items (each one of which
> I presume is an independent persistent object)?

Each bucket is a distinct persistent object.  I hesitate to say
"independent" because they're related to each other by three different kinds
of C pointers.  Forward iteration follows only one of those kinds, the
buckets' "next" pointers, which is by far the simplest of the three kinds,
and so the hardest to screw up (the buckets are chained in a singly-linked,
NULL terminated, flat list via "next").

If this is a bug in the C BTreeItems code, this is the most likely way:
every time a field of a persistent object P is referenced at the C level, P
must be the first argument to a PER_USE_OR_RETURN() macro (or moral
equivalent) first.  This is to ensure that P gets loaded, in case P is a
ghost.  If P is a ghost but the attempt to load it fails, the second macro
argument becomes the instant return value of the function from which the
macro was invoked.  If C code doesn't do this dance first, and P is a ghost,
then referencing attributes of P will read up trash memory, and anything can
happen then.  If C code does do this dance, but screws up the return value
(indicating success instead of failure), I suppose anything could happen
too.

However, BTreeItems_item() and BTreeItems_seek() (which are the only C BTree
routines involved in forward iteration) are extraordinarily *simple* in this
respect, only needing to activate a single persistent object (the current
bucket) at a time.  The logic for this is "obviously correct by eyeball".
Other parts of the BTree code are much more complicated in this respect.

> I suggest this because this code is really the only "in the wild" code I
> know of that tries to use ZODB as if it's OK to actually do a lot of
> concurrent reading and writing of the same data structures.

It's certainly possible that additional relevant bugs are hiding in BTrees,
ZODB and ZEO.  For example, maybe a bucket's "next" pointer can end up
pointing at recycled memory.  Then it would be perfectly understandable that
this memory would change "for no reason at all".  OTOH, that a bucket's next
pointer points to it is supposed to keep the pointed-at object intact, by
keeping its refcount > 0 for as long as it is pointed at.  So even something
as mundane as a refcount bug *could* account for this.

OTOH, a "too small" refcount bug in the BTree code would likely have been
noticed hundreds of times by now (the only refcount bugs I ever found in the
BTree code, and there were more than a few, were "too large" refcount bugs;
the only symptom that has is memory leakage).

Your turn <wink>.