[Zope-ZEO] How do ZODB transactions work?

Fri, 29 Sep 2000 10:00:48 -0400

On 28 September 2000, Chris McDonough said:
[I explain the situation]
> The context is this: we (currently) have a single-threaded web
> application server with a single connection to a single database.  Most
> of the data in that database is our domain objects, which *should* get
> committed only after explicit user action and careful validation to
> ensure consistency.  However, we also put session data in the database,
> because we want it to persist across server restarts and this is a
> convenient way to do so.  Also, most session objects keep a reference to
> domain objects -- eg. if someone has logged in on a session, that
> session keeps a ref to the corresponding User object.  So if we put the
> sessions in the same ZODB as the domain objects, it Just Works.

[Chris responds]
> You may want to consider mounted databases for this, unless you're using a
> nonundo storage for everything.  For more info, see CoreSessionTracking in
> the dev.zope.org Current Projects section.  Mounted database transactions
> are, however, controlled by the sole global transaction manager.  It sounds
> like you want to manually set up two connections to different databases, one
> for domain objects and one for session objects.
[...]
> You need two connections for this, and you need to manage them manually.  If
> you didn't, you'd be redefining the term "transaction".  I think you really
> mean connection.  In Zope, there are a pool of connections shared between
> all the threads.  Each connection represents a copy of a database.

I suspect you're right about my terminological confusion; my knowledge
of databases is almost entirely folkloric, and when you get right down
to it I don't really *know* what a transaction is.  (That's why I really
wish this stuff were documented!)  So if you say I need two connections,
fine -- I need two connections.  I'll buy that!

OK, I've played around a bit and I think I see how to have two
connections to the same database: it looks like I need a *single*
Storage object, because if I try to open two Storages to the same file,
I get a locking conflict (I'm not using ZEO here -- that's in the works,
but I want to reserve the right to use a plain vanilla
{File,Berkeley}Storage.)  So the setup is unexciting:

  from ZODB import DB
  from ZODB.FileStorage import FileStorage
  filename = "/tmp/mxdb.fs"
  file = FileStorage (filename)

Now open two connections to the same file:

  db1 = DB(file)
  conn1 = db1.open()
  root1 = conn1.root()

  db2 = DB(file)
  conn2 = db2.open()
  root2 = conn2.root()

Fetch the same object from each connection, and see what we have:

  me1 = root1['user_db'].users['gward']
  me2 = root2['user_db'].users['gward']

  print `me1`, `me1._p_oid`
  print `me2`, `me2._p_oid`

Output of this step:

  <User at 82401c0: gward> '\000\000\000\000\000\000\000!'
  <User at 824dfd8: gward> '\000\000\000\000\000\000\000!'

...ie. two in-memory "copies" of the same database object, which leads
me to believe I can update them independently.  Let's see:

  me1.prefix = "Sir"
  print me1.format_realname(include_prefix=1)
  print me2.format_realname(include_prefix=1)

And the output from this step is:

  Sir Greg Ward
  Greg Ward

Cool!  I think this is (part of) the behaviour I was looking for -- same
database object, different in-memory "facade" presented to the Python
code working on it.

The clue I'm missing is how to "unify" the two disparate versions: how
do I commit the changes to me1 so that me2 also becomes "Sir Greg Ward",
or alternately, how do I abort the changes to me1 so that this change is
never seen?

> You may actually need two separate databases.

That had occurred to me.  The problem is that our session objects have
references to domain objects (eg. every Session has a 'user' attribute,
which is the User object logged in on this session), and we don't want
to accidentally copy our whole complicated web of long-lived domain
objects into the database of transient session objects.  Or can separate
databases somehow share objects?  Or should we use
'__{set,get}state__()' hooks to reduce those object references to
strings when the sessions are pickled?  That seems dodgy: it would be
too easy to forget to add things to the hooks when adding attributes to
a session class.  (We actually have one main session class and a whole
family of "sub-sessions", to keep from cluttering up the main session
class.)

> Maybe.  :-) It may be problematic to successfully associate a request with
> an existing (uncommitted) transaction as sessions are continued across
> requests.  I guess it's common in Java servlets to do this by storing a
> database connection inside a session variable, although this notion is
> foreign in the Zope world as a request == a transaction.

But "HTTP request == transaction" is an artifice of Zope, not ZODB.
That artifice will be carried over to our *session* machinery, for much
the same reason: if the server shuts down (or crashes), we want to
remember what the user was doing in this session!

We also have to worry about long-lived transactions on objects that
might only get committed after a number of HTTP requests -- eg. if you
have to go through a series of forms to update some object graph, then
you only want to commit those objects when you've successfully processed
the last form.  If we do a global, database-wide commit on every HTTP
request, we *lose*: that means committing transient session objects
correctly, but long-lived domain objects can be committed in an
inconsistent state.  That's what we're currently doing, and we need to
fix it before we can put this system into production use.

So far it seems that my original hunch was right, except I was
misunderstanding what a "transaction" is.  If we have to maintain N+1
connections -- one per open session, and then one for the whole server
to commit *session* changes -- that's fine.  Whatever.  I just need to
know how to commit the transaction *for a particular connection*, rather
than whatever 'get_transaction()' happens to return.

Thanks!

        Greg
-- 
Greg Ward - software developer                gward@mems-exchange.org
MEMS Exchange / CNRI                           voice: +1-703-262-5376
Reston, Virginia, USA                            fax: +1-703-262-5367