[ZODB-Dev] Potential BTrees splitting bug

Dieter Maurer dieter at handshake.de
Wed Sep 24 21:43:11 EDT 2003


Tim Peters wrote at 2003-9-24 15:10 -0400:
 > ...
 > The patch may be, or contribute to, the problem.  Could you please try it
 > without your patch too.

This will be not so easy ...

 > > I looked at about 40 failures. All followed the same pattern:
 > >
 > >   It was always the last key of a 17 element bucket that were
 > > unordered.
 > 
 > Sorry, I'm not clear on what that means.

Thought, I already had been clear -- but apparently were wrong:

The check fails in the "BTrees" consistency check, finding
an unordered key
("key KKK larger than upper bound BBB, index III, OOBucket XXX at path PPP).

The index ("III") is always 16, the path ("PPP") is varying but always
points to a bucket with 17 elements.

 > Was the failure that a thread
 > believed it had added a key to the BTree but that the key wasn't actually
 > found there later?

No.
The failure is in the preceeding consistency check.

 > Did the test produce any output when it failed (and if
 > so, can we see it -- the most recent versions of the tests do an exhaustive
 > dump of the BTree to stdout when they fail, which is most helpful)?

I use such a version and it dumped indeed the tree.

Unfortunately, it dumped it to "stdout" and not to "stderr".
Therefore, it did not end up in the ("testrunner") log file.

But, I looked at these dumps (about 40).

  *  Each looked a bit different (what is to be expected)

  *  They looked good apart from the isolated misordering
     facefully reported by the consistency check.

     There have been some missing elements (not many)
     almost surely caused ZODB conflicts.

 > >   For me, it looked as if under some circumstances the last element of
 > > a splitting bucket were forgotten to be moved into the new bucket.
 > >
 > >
 > > The test passed always on Mono 1.5 Ghz AMD, SuSE Linux and otherwise
 > > identical parameters.
 > 
 > ZEO cache bugs are extremely sensitive to quirks of platform timing, so it's
 > not surprising that just changing boxes can change what you see.  For
 > example, the last batch of ZEO cache consistency bugs we fixed were
 > extremely easy to provoke on Win98SE(!), difficult to provoke on Win2K, and
 > seemingly impossible to provoke on Red Hat Linux, with comparable hardware
 > in all cases.

For me, it does not look like a cache bug.
There should *never* have been a bucket with such a content.

A failure looks e.g. like (from memory):

  ....
  PPP: OOBucket with 17 keys (bounds 190, 206 -- taken "OOBTrees" keys)
       0: key 190, value 2
       1: key 191, value 1
       ...
       15: key 205, value 1
       16: key 220, value 2 ---> consistency violated

  PPP+1: OOBucket with ... keys (bounds 206, ...)
        0: key 206, value 2
        1: key 207, value 1
	....
	13: key 219, value 1
	14: key 221, value 1
	....

There have been cases where the misplaced key should have gone to "PPP+2".


As noted earlier: the misplaced key always has been the last (index 16)
key in an OOBucket with 17 keys.
       

If helpful, I can force output from stdout and stderr into a single
log file and post (or send privately to you?) the resulting file.


Dieter



More information about the ZODB-Dev mailing list