[ZODB-Dev] summary of version discussion

Jeremy Hylton jeremy@zope.com
Thu, 5 Dec 2002 18:33:12 -0500


We had a very profitable discussion of versions in ZODB4 last week.
(About 75 messages posted.)  I'm not ready to make any decisions about
versions, but I'd like to summarize the discussion and open issues.

It's important to clarify what we mean by the term versions.  Many
ZODB storages keep multiple revisions of an object, and the database
has several features (like undo) that take advantage of these
revisions.  In the database literature, this kind of database is
called a multiversion database.

Unfortunately, a ZODB version has nothing to do with object revisions
or the traditional definition of "multiversion."  The version feature
in ZODB was developed to support a primitive version control mechanism
for Zope.  A version is a collection of object modifications that are
not visible to other database clients unless they explicitly use the
version.  (A version is named by an arbitrary string.)  A client using
a different version or no version at all will see the object as it was
before being modified in the version.  An object modified in a version
is locked, so that it can only be modified in the version.  A later
transaction can either abort or commit a version, which unlocks the
objects.  A commit also makes the current versioned objects visible to
everyone.

A Zope user can modify objects in a version to make changes to a site
without making them visible to all users until the changes are done.
However, the locking behavior makes versions difficult to use in
practice.  If a page is re-indexed in a version, the catalog gets
locked and no further changes to the catalog can be made until the
version commits.  As a result, version control in Zope is now usually
done without using ZODB versions.

(Side note: If versions are presented in ZODB 4, we're going to give
them a different name to avoid the endless confusion over the
distinction between an object revision and an object version.)

That's the background on versions.  Guido proposed that the version
feature be removed from ZODB4.  To summarize his basic argument:

   People are cautioned against using versions in Zope.  If they
   aren't useful in Zope, ZODB doesn't need to provide them.  Removing
   versions will remove complexity to the code all over the place: the
   version argument to some APIs can be removed, the ZEO cache
   implementation and wire protocol become simpler, and so on.

A number of people agreed with Guido.  They said, in essence, we've
never used versions so we won't miss them; several folks put it more
colorfully.  Greg Ward said his primary complaint with ZODB is that
it has many complex features only because of peculiar Zope
requirements.  Versions are one example, he said.  (What are the
others, Greg?)

A few Zope users noted that they still use versions.  Tres Seaver
said:

> Although versions can't play well with ZCatalog, and hence with
> "content", they work nicely for making "on the side" changes to
> "software" (e.g., to the main template for the site).  I would be
> averse to killing them off before seeing just how much
> "through-the-web" software / configuration happens in Zope3, and
> whether alternative mechanisms (filesystem synchronization, for
> instance) can address the same use cases with less grief.

Tres seems to disagree with one of Guido's premises -- that the
version feature isn't useful in Zope.  Several other people mentioned
that they use versions, too, although they didn't offer as detailed a
rationale as Tres.

I'm probably the primary advocate of versions.  My rationale is that
the locking feature of versions allows long-running actions to be
performed in a transactional manner.  Since ZODB uses optimistic
concurrency control, a long-running transaction has a higher chance of
failing with a conflict error.  In the extreme, it could be impossible
to commit a long-running transaction because some other transaction
would always introduce a conflict.

I have never written an application that uses versions in this way,
but the feature seems useful if ZODB is to support applications with
this requirement.  One could make a YAGNI argument against keeping
versions for only this reason.  (Ex: We ain't never goin' to write
applications like that.)  The only example I'm familiar with is one
sketched by Sean Upton about web service transactions.  See
http://lists.zope.org/pipermail/zope3-dev/2002-October/003112.html

I don't know whether other databases, particularly optimistic ones, do
about the problem.  There's an analogous problem with pessimistic
databases.  A long-running transaction can lock resources for a very
long time and because the cost of a restart becomes very high.

Some pessimistic databases use chained transactions -- CHAIN WORK as
opposed to BEGIN WORK.  Basically, with two chained transactions, when
the first commits the second begins immediately without any
possibility of another transaction modifying the data.  This doesn't
really address all the problems.  Updates becomes visible as
transactions commit.  Some locks could still be held a long time.

   Gray and Reuter make two interesting observations: 

   "Activities of long duration have been a worry to users of
   transaction-oriented systems from the beginning, because flat
   or nested transactions as they are implemented now simply do
   not go well with such compuations."

   "Note the change in perspective: what has always been the virtue of
   atomicity suddenly turns into the vice of 'work lost.'  Real
   applications are like that."

A more esoteric approach is called a "saga."  I don't understand it
completely, but the basic idea is to extend chained transactions with
compensating transactions.  For each transaction, there is a
compensating action that undoes its effects.  If a sequence of chained
transactions exits abnormally, all the compensating transactions are
committed.

I see some analogy between a saga and a version, but it's rough.  This
suggests that using versions for long-running activities is probably
the right general direction.  It would probably be useful to do more
research and see what more recent solutions to the problem are.  (The
key paper on sagas was published in 1987 and the Gray and Reuter book
I'm referring to was published around 1993.)

So there's an argument to be made for keeping versions because they're
useful for ZODB applications.  It's not clear how strong the argument
is, because we don't have concrete application requirements.

The second half of Guido's argument is that versions are complicated
to implement.  Even if they might be useful, does their utility
outweight their complexity?

I think that a lot of the complexity has already been paid for.  We've
got (probably) correct implementations of versions for two storages
and ZEO.  Some of the complexity is ongoing.  Every new storage needs
to decide whether it will support versions, and has to implementation
APIs the include version-specific arguments even if the storage
doesn't support versions.

I wonder, however, if we can mitigate this cost.  The basic
architecture for versions could remain, but the interfaces could all
be changed.  Here are two ideas that seem worthwhile to pursue.

  - Require that when a storage commits a transaction all its work be
    part of the same version (possibly the default version).  Then the
    version argument could be removed from things like store().  Not
    sure how the storage would be told of the version.  Perhaps the
    version gets added to the transaction metadata.  Perhaps there's a
    new call.

  - Change ZEO to deal with versions just like a regular storage.  A
    lot of the complexity of versions in ZEO comes from the zeoLoad()
    call which promises to return non-version data and version data if
    it is available.  It's actually costly to support this API for
    FileStorage, since checking whether an object is modified in a
    storage involves at least one extra seek().  Instead, just have
    a regular load() call.

One counter-argument on the complexity front is that removing versions
would mean that lots of code would change between ZODB3 and ZODB4.
That's not necessarily bad, but I'd be very happy if most of the
FileStorage code was the same between the two versions.  It's easier
to maintain.  Jim observed that we could keep the storage
implementation the same at the lowest level but in ZODB4 add a adapter
that translates from the simpler API to the more complex one.

A few people discussed how versions could be used to support
long-running activities.  Eron Lloyd asked whether they could be
exposed through WebDAV in the ZMI.  (I have no clue.)  Andrew Kuchling
made a wise remark about using versions to support a client session
for a Zope-like application.  If the user walks away, the objects
remain locked in the version.  I think we'd need to make versions be
more like leases.  If you don't renew the version lease occasionally,
it times out and the version is aborted.  (More complexity, but Barry
thought not too much more.)

Shane said a bunch of things, but I'm running out of steam.  I believe
one important thing was that there was no explicit create a version
operation.  This may be more of a Zope problem, but it does seem like
it would be useful to have a feature that locked a bunch of objects in
a version without modifying them.  This would enable a very fast
transaction that just locked a bunch of objects to get the version
started.

One alternative to versions for long-running transactions is judicious
use of conflict resolution, but conflict resolution has its own
problems.  (It is still a very useful feature!)  The conflict
resolution code can only look at one object at a time, and then it
only gets to look at the pickled state not a live object.  It also
runs inside a ZEO server, which means a potentially large amount of
work could have to be performed when the long-running transaction
finally commits.

That's it for the summary of issues.  No conclusions yet, but we'll
get to that soon.