[ZODB-Dev] Re: BTrees strangeness (was [Zope-dev] Zope 2.X BIG Session problems - blocker - our site dies - need help of experience Zope developer, please)

Chris McDonough chrism at plope.com
Wed Mar 3 15:15:14 EST 2004


On Wed, 2004-03-03 at 12:44, Jeremy Hylton wrote:
> >         for key in list(self._data.keys(None, max_ts)):
> >             assert(key <= max_ts)
> >             STRICT and _assert(self._data.has_key(key))
> >             for v in self._data[key].values():
> >                 to_notify.append(v)
> >             del self._data[key]
> 
> I don't have much context for this question.  It's definitely the case
> that a corrupt BTree there are keys you can reach using keys(), which
> follows the bucket next pointers, that can't reach using a lookup, which
> follows child pointers down through the interior nodes.

Right, I figured as much.

> If you could call the check functions on the BTrees in question.  That's
> object._check() to check C internals and BTrees.check.check() to check
> value based consistency.

I'm hoping Alex will do this for us as I haven't been able to make the
error occur in isolation.

> So how is the BTree is question used?  If the test is failing here, it
> seems most likely that the BTree was corrupted by a write somewhere
> else.

The IOBTree in question (_data) is a mapping from an integer to an
OOBTree.  The integer key represents a time period value, the OOBTree
value is a mapping of session ids to session objects which were created
within this time period.  The _data BTree is of course written to
elsewhere in the code
(http://cvs.zope.org/*checkout*/Products/Transience/Transience.py?rev=1.32.12.2.2.2&content-type=text/plain)

If anybody (there's that *anybody* again, can you feel your ears burning
Jeremy?) can grep that code for _data and see if there's someplace I'm
doing something suspicious with that BTree, I would be grateful.  The
last time I had a problem like this it turned out I was iterating over
the results of a BTree's keys() method (which returns an iterator) and
deleting those keys out of the BTree in the loop body, which caused the
issue.  But I've been careful to not do that, and as far as I can tell
there's nothing about the code that does anything terribly dumb.  This
is corroborated by unit tests for that code, and the fact that the error
doesn't occur under "light" load; it appears just to happen in
situations where the transience code is executed in parallel by  more
than one thread (thus leading me to believe it may have something to do
with conflict resolution).

- C





More information about the ZODB-Dev mailing list