[ZODB-Dev] Problem with handling of data managers that join transactions after savepoints

Jim Fulton jim at zope.com
Mon May 10 16:41:17 EDT 2010


The following is complex. Unless you're a ZODB developer or nearly so,
you may want to skip this. :)

I'm looking into a problem we've run into and found a problem with the
way savepoints are handled that was exposed by recent tightening of
the way transaction-related methods are called.

The problem arises from the way the data manager abort method:

    def abort(transaction):
        """Abort a transaction and forget all changes.

        Abort must be called outside of a two-phase commit.

        Abort is called by the transaction manager to abort transactions
        that are not yet in a two-phase commit.
        """

is called.  As the documentation says, this is called when a
transaction is aborted.  Any data manager called should assume that it
is no longer joined to a live transaction.  ZODB's Connections assume
exactly this.

When a data manager joins a transaction after there have been
savepoints in the transaction, there needs to be a way to handle
rolling back to the older savepoints.  It's too late to ask the data
manager for a data-manager savepoint.  In this case, a special
data-manager savepoint is created that calls abort on the new data
manager whenever an older savepoint is rolled back.  This use of abort
is at odds with the documentation of the abort method, because rollng
back a savepoint doesn't abort the transaction.

The problem for ZODB, and presumably, for other data managers is that
when abort is called, the datamanager (Connection) markes itself as
needing the join the transaction.  If data are modified in the
connection, the connection will join again, at which point the data
manager will be doubly joined.  When the transaction is committed, the
transaction methods, tpc_commit, commit, tpc_vote, and tpc_finish are
called multiple times.  In ZODB 3.10, the second tpc_begin call leads
to an error, because it had been called before.

(If a pre-ZODB 3.10 ZEO client talked to a ZODB 3.10.0a1 server, the
multiple calls led to a commit lock being held forever on the server,
preventing further commits.  ZODB 3.10.a2 detects the multiple calls
and raises an error at the second vote call, causing the client
transaction to fail and the server to continue committing other
transactions.)

Among the ways to fix this:

A. Change transaction._transaction.AbortSavepoint to remove the
   datamanager from the transactions resources (joined data managers)
   when the savepoint is rolled back and abort called on the data
   manager. Then, if the data manager rejoins, it will have joined
   only once.

   Update the documentation of the data manager abort method (in
   IDataManager) to say that abort is called either when a transaction
   is aborted or when rolling back to a savepoint created before the
   data manager joined, and that the data manager is no longer joined
   to the transaction after abort is called.

   This is a backward incompatible change to the interface (because it
   weakens a precondition) that is unlikely to cause harm.

B. Disallow joining a transaction after there are savepoints.

   This makes a common use case more complicated.  Suppose I want to
   do a batch of work made up of work items. I want to commit the
   batch as a whole and want to skip items when there are problems. In
   pseudo code this looks like:

      for item in items:
          savepoint = transaction.savepoint()
          try:
              ... do the item of work
          except:
              ... there was a problem
              savepoint.rollback() # skip the item and keep going
      transaction.commit()

   Note that the first savepoint is created before we do anything.
   Disallowing joining after savepoints would make this scenario a lot
   more complicated.

C. Add a new data manager method to handle this use case. The
   semantics of the new method is that the data manager should
   discard any changes but should not rejoin the transaction.

   If the data manager doesn't support this new method, then an error
   is raised if a savepoint is rolled back that was created before the
   data manager joined.

   This is more backward incompatible than A and compicated data
   managers.

D. Change the transaction join method to ignore multiple joins of the
   same data manager. This would just hide a deeper problem, which
   rarely turns out well in the long term.

I plan to implement A soon if there are no objections.

Unless someone somehow convinced me to do D, I'll also add an
assertion in the Transaction.join method to raise an error if a
data manager joins more than once.

Jim

--
Jim Fulton


More information about the ZODB-Dev mailing list