[ZODB-Dev] RE: [Zope-Annce] ZODB 3.2.4 release candidate 1released

Thu Sep 16 11:43:43 EDT 2004

[Tim Peters]
>> ...
>> A key missing detail in the above is whether iobtree ever gets
>> committed, or whether it's solely in memory over its lifetime.

[Chris McDonough]
> Sorry, iobtree does get committed at various points over its lifetime but
> it's actually unknown whether it gets committed between the time a set of
> keys are added to it and when the key error happens for a key that is
> part of that set.  I haven't been able to reproduce it (unhelpful, I
> realize, but true).

It doesn't have to be for a key that's part of that set -- you're
underestimating just *how* screwy things can get if pieces of persistent
state get reverted by magic.  For example, adding a key can split a bucket:

    old_bucket -> mutated_old_bucket followed by
                  brand_new_bucket containing some of the old keys

and if brand_new_bucket gets lost then searching for one of the old keys
that got moved into it will fail.  It's quite possible that the parent node
will get its state reverted from the database, so that the parent node no
longer knows anything about the new bucket, so the btree even looks
internally consistent at that level despite that a pile of old keys have
vanished from cache state.

For simplicity (and sanity <wink>) I gave a concrete example with a single
bucket, but it can get much worse than that.

> ...
> Yup, I didn't know if the value related to the missing key needed to be
> materialized at all given the example.  "No" is the answer I presume.

That's right.  BTree lookup btree[key] never unghosts any *values* in the
btree, not even the returned value.  When an IOBucket is first loaded, all
the integer keys are materialized, but all the values are ghosts.  The
values are never unghosted by anything until you do something that needs to
look *inside* the values.  Just looking up the value associated with a key
will return a ghost (unless that value has been materialized earlier for
some other reason).

>> The only relevant things that *can* be ghosted here are IOBTree and
>> IOBucket nodes.  Both certainly have to be unghosted to peer into their
>> contents.

> OK, it's helpful to know what must be unghosted to do this operation in
> any case.

Yup.  When you first load a top-level IOBTree, the integer keys in the
top-level IOBTree node are materialized, but the child nodes are all ghosts.
In general, nothing gets unghostified unless and until it *has* to be
unghostified.  Indeed, if you do anIOBTree[key] starting with an empty
cache, and anIOBTree contains a million interior nodes, the only things
unghosted are the btree and bucket nodes on the direct path from the root of
the btree to the bucket containing the key.  That's usually two or three
nodes total, maybe four for a very large tree.

> ...
> Curious if this code (present in BTree_getm of BTreeTemplate.c) could do
> arguably the wrong thing in the face of a POSKeyError:
>
>   UNLESS (PyArg_ParseTuple(args, "O|O", &key, &d)) return NULL;
>   if ((r=_BTree_get(self, key, 0))) return r;
>   UNLESS (PyErr_ExceptionMatches(PyExc_KeyError)) return NULL;

Good eye!  If you do abtree.get(key, adefault), and an attempt is made to
unghostify a dangling interior btree or bucket node, I believe you're right,
this code will return `adefault`.

> From a quick reading of the docs a PyErr_ExceptionMatches also compares
> true if a base class matches.

Yes, it's the same logic as Python "except KeyError".  Note that the C code
makes no assumption about which kind of storage is in use, so that the
storage API requires raising KeyError on a missing object makes this deeply
ambiguous.

> I mean, I guess it doesn't matter much (eventually you get None back).

Hiding serious errors always sucks.

> But it'd be nice to know if a POSKeyError resulted from this call
> instead of hiding it unintentionally (if that's indeed what it does).

It's more that the code needs to distinguish (but doesn't distinguish)
"KeyError because the key isn't present" from "KeyError because the storage
API requires raising KeyError if the btree is insane due to missing internal
nodes".

> But probably not critical at all.

Dangling references are supposed to be impossible <wink>.  Tracking down
ways in which they occur anyway is critical, but there's really no reason to
suspect that they have any skill at hiding when they do occur.

When I have time to devote to this, I'd rather spend it on beefing up ZODB
to verify upon commit that the subobjects referenced by a modified object
actually exist in the database.  I believe Toby's DirectoryStorage makes
this kind of check, and it's a good idea.

...

>> Here's code that breaks, provided you use a pre-repaired ZODB 3.2.  It's
>> a minor variant of code I posted before.

...

> Whew.   That must have took some work to think up. ;-)

It took some work the first time, when I posted the code showing how
POSKeyError can get provoked.  The changes to provoke a "missing key" error
instead were trivial.

The point is that the "subtxn commit, close, open" dance could make cache
state vanish.  If it was cache state for a newly created object, then
POSKeyError is the outcome.  If it was cache state for a *modified*
pre-existing object, then you get the pre-modified state back.  There's no
limit on how perverse the symptoms could get.  For example, while I haven't
posted an example for this one, it's also possible to end up *committing*
dangling references, so that they become a permanent part of on-disk
database state (in regards to which, I think that's highly relevant to
Richard Jones's current thread on zope-dev).

> At very least, I now have a decoy explanation for now until we find the
> next case.! ;-)

This one is so bad it can explain anything.  Happily, we've fixed "the last
bug" of that kind so many times there can't be any more of them <wink>.