[ZODB-Dev] RE: [Zope.Com Geeks] several critical bug fixes for ZODB 3.1 / ZEO 2.0

Tim Peters tim at zope.com
Wed Jun 11 23:43:41 EDT 2003


[Jeremy Hylton]
> ...
> Most of the bugs were in ZEO.

Specifically, this one wasn't:

> Fix a bug in conflict resolution that failed to ghostify an object if
> it was involved in a conflict.  (This code may be redundant, but it
> has been fixed regardless.)

This was in the ZODB Connection object.  However, it didn't appear possible
for this bug to have any effect outside of ZEO (that's why we're saying "it
may be redundant" -- it probably is, but we're playing it better safe than
sorry here).  There's another and more general mechanism that also
invalidates stale objects involved in conflicts (as well as modified objects
not involved in conflicts), which appeared to work fine outside of ZEO.  The
reason this mechanism didn't work in ZEO was the first bug Jeremy mentioned:

> Fixed a critical race condition in ZEO's cache consistency that could
> cause invalidations to be lost. ...It was possible for the zrpc layer
> to re-order the response to the load and the invalidation, effectively
> causing the invalidation to be lost.

The "zrpc layer" is specific to ZEO, so non-ZEO applications didn't suffer
this rare re-ordering.

I'll add that we now have a stress test that creates conflicts at a
prodigious rate using ZEO (multiple threads modifying and committing the
same multiple objects across multiple connections just as fast as the HW can
do it, on a single machine using 'localhost', i.e. bashing threads into each
other much faster than can be done over a network).  Before these bug fixes,
it typically provoked corruption and/or data loss before 15 seconds elapsed,
on both Linux and Windows boxes.  After the bugfixes, I've run it overnight
more than once without any hint of trouble.

The reason "most of the bugs were in ZEO" is (I believe) that we didn't have
a stress test like this using ZEO before.  We did for ZODB sans ZEO, and we
haven't found any similar bugs in ZODB since the ZODB stress tests stopped
provoking problems.  These tests put a much heavier load on the ZODB and ZEO
code than any real-life app could, and some of the bugs they've provoked are
so excruciatingly obscure and unlikely that I doubt they've ever been seen
in the field.




More information about the ZODB-Dev mailing list