[ZODB-Dev] Re: BTrees strangeness (was [Zope-dev] Zope2.XBIGSession problems - blocker - our site dies - need helpofexperienceZope developer, please)

Chris McDonough chrism at plope.com
Thu Mar 4 00:35:49 EST 2004


On Wed, 2004-03-03 at 23:36, Tim Peters wrote:
> So what are we left with?  Cases of "spontaneous corruption" are usually
> pinned on pilot error, but since you're using an IxBTree flavor that's hard
> to swallow.  Other cases have been pinned on the BTree, ZODB, and ZEO
> implementations.  We haven't managed to blame any on the BTree
> implementation in well over a year, and that's been subjected to extreme
> stress testing since then.  Several subtle timing holes have been plugged in
> both ZODB and ZEO since then.  Which version of ZODB is in use, BTW?  I hope
> it's the most recent of whatever flavor is involved, else there's no mystery
> worth pursuing.  Your idea to try using FileStorage instead was stellar,
> since that's been by far the most heavily tested in former
> corruption-provoking ZODB and ZEO stress tests.

Well, shame on me for not knowing the obvious, but... Alex, what version
of Zope was the one you were using when you saw this error occur?  I
think I remember noting at some point that it was Zope 2.7.0, which
would imply ZODB 3.2.1, but revisiting our email conversations it's not
completely clear.

FWIW, I am going to try to wire up the ZODB invalidation tests to
TemporaryStorage and see what happens there too.  I have a suspicion I
will find something bad, at least I hope I do.

> Something that could be very helpful is to add
> 
>      from BTrees.check import check, display
>      ...
> 
>      try:
>          self._check()
>          check(self._data)
>      except AssertionError:
>          display(self._data)
>          raise


I assume you mean self._data._check() in the second line there.

> 
> That will show the internal structure of the BTree if it's damaged.  The
> last several cases of corruption due to timing holes in invalidation
> invariably resulted in BTrees with one bucket ending with something like
> 
>     ... 31 32 33 45 46 47
> 
> and then thn next bucket starting with something like
> 
>     34 35 36 37 38 39 ...
> 
> The check() function complains about the 45 46 47 in the first bucket then,
> because they're larger than the 34 that starts the second bucket.  This can
> happen when a bucket splits, and invalidation doesn't manage to force new
> copies of all of {bucket that split, the other bucket it split into, the
> parent node of the bucket that split} to get loaded.  If you see something
> like that again, it will make Jeremy's day <wink>.

Thanks Tim, you rock, as usual.  You get all that Alex? ;-)

BTW, Alex, I visited your website, (http://www.zwarehouse.org), to get a
sense of what you're doing, and from that page I visited a site that you
list as a customer
(http://www.chalkface.com/catalog/html/custom/index_html?c_category_id=1), and here's what I get splattered with there at the moment:

Site Error
An error was encountered while publishing this resource. 

Error Type: KeyError
Error Value: 1078319300

Ouch.  Looks familiar.  The good news is that it must be at least
nominally repeatable then. ;-)

So, to recap, the ball is in Alex' court.  He should try the following,
in descending order:

- Upgrade to Zope 2.7.0 or Zope 2.6.4 if not there already.

- Use a FileStorage to back the temp_folder database instead
  of a TemporaryStorage.

- If KeyErrors continue to emanate from the code even with a
FileStorage-backed temp_folder, insert the above code that Tim provided
in a try: except: block around where the error seems to occur and report
back what comes up on the console.  If you don't want to stare at the
console forever, send the results of display() to a file instead.

HTH,

- C





More information about the ZODB-Dev mailing list