[ZODB-Dev] Storage API change: Checking for reading out-of-date data

Jim Fulton jim at zope.com
Mon Aug 30 17:36:50 EDT 2010


ZODB used multi-version concurrency control to assure that data read
are consistent.  It doesn't check that or require data read to be up
to date.  For read-only transactions, this is approriate.

Even for write transactions, not checking whether reads are up to date
isn't typically a problem, since the important data read is also
updated and we check for write conflicts.

The approach used by ZODB is a common one and represents a generally
good tradeoff between consisntency and performance.

The approach, however, can run into probems when data from one object
are read and used to update a different object.  I've mistakenly
tended to view this situation as an edge case.  However, BTrees,
perhaps the most heavily used data structure in ZODB applications,
follow this data access pattern. In particular, internal nodes are
read to determine which subnodes data should be written to. An out of
date internal node can lead to data in BTrees being missplaced.  This
doesn't happen very often, and when it does happen, it's been pretty
mysterious.

This is a fairly serious problem.  It's serious enough that I'm, going
to add some APIs in ZODB 3.10 to deal with it.  One of these is:

  class ReadVerifyingStorage(IStorage):

      def checkCurrentSerialInTransaction(oid, serial):
          """Check whether the given serial number is current.

          The method is called during the first phase of 2-phase commit
          to verify that data read in a transaction is current.

          The storage should raise a ConflictError if the serial is not
          current, although it may raise the exception later, in a call
          to store or in a call to tpc_vote.

          If no exception is raised, then the serial must remain current
          through the end of the transaction.
          """

The tricky thing about this is the last paragraph.  If the method
doesn't raise an error, then there can't be updates to the object
until after the transaction commits.  For most current
implementations, this implies that the storage lock is help when this
is called.  For ZEO, some special care will be necessary because the
storage lock isn't acquired until the very end of the first phase of
2-phase commit.

I'm particularly concerned about the impact on RelStorage.

This API will be used whenever a BTree is modified, so it will be used
fairly often. It won't be used for all reads, although furture
versions of ZODB might provide an option to check al reads, or all
reads in trannsactions that write data.

Jim

--
Jim Fulton


More information about the ZODB-Dev mailing list