[Zope-ZEO] ZODB database corruption under multiple connections

Jim Fulton jim@digicool.com
Thu, 18 Jan 2001 15:40:02 -0500


Lennard van der Feltz wrote:
> 
> The script shown below illustrates an issue where it appears that
> transactions don't rollback completely when a ConflictError occurs in one of
> two connections.

This is a correct analysis.  I've checked fixes for this into the Zope
public CVS:

  http://www.zope.org/Resources/CVS

The file Connection.py was affected.

You can get these changes there.  The changes will be included in Zope 2.3
(although they didn't get into Zope 2.3 beta 1).  I'm sure they'll make it
into Andrew's distribution before long.

When objects are added to the ZODB, meta data are set on them.
This meta-data was not cleared when a transaction was rolled back.
This made it look to subsequent transactions like the objects had
been added and were up to date.

Note that this error should not affect Zope applications, as 
any new objects created in a web request are generally recreated
when a transaction is retried.

> As a result the database is corrupted,

FWIW, I don't consider this corruption. The database is
"inconsistent", with "dangling pointers" for some objects.
The database storage integrity is not affected. While I'll
admit that this intrpertation of the term "corrupt" may be
arbitrary, it's a useful distinction for ZODB.  For example, 
database corruption is typically addressed by some sort of 
recovery procedure, but there is no recovery procedure that would
fix this problem.

(snip)
> You'll see in the script that I found a crude hack that will hide the
> problem,

It actually addressed the problem directly by reseting some meta-data 
indicating (slightly indirectly) that the object is still in need of
saving.

(snip)

> I would appreciate any insights into this.

Hopefully, the explanation above helps.

> Is it a bug that should be
> submitted as such,

I would say "yes" if I hadn't already checked in a fix. :)

> or is there another explanation? Where should I look for
> the root cause? Let me know if you are interested in seeing the other
> scripts.

Thanks for this report. The script you provided was very helpful in 
chasing this down.

I do have some comments on the script below:
 
> Thanks,

Thank you.
 

> ##--------------------------------------------------------------------------
> ##
> #                   Import standard library modules
> import sys, os, threading, time
> sys.path.append(os.curdir) # append current director to the PYTHONPATH
> ##--------------------------------------------------------------------------
> ##
> #                   Import non-standard library modules
> import ZODB
> from ZODB import FileStorage, PersistentList

PersistentList would have caused me pain if you had used it. 

I wish Andrew would not include this in his distribution in the ZODB
package if I don't include it in mine.

I would be reluctantly willing to include something like
PersistentList in ZODB is I had some evidence that people actually
needed it. Persistent sequences seem to be like persistent mappings
and people sort of expect them to be there for symetry, but I've
never found a use for them and never seen anyone use them, or at least
keep using them after initial experiments.


> import Persistence, BTree, PersistentMapping
> ##--------------------------------------------------------------------------
> ##
> #                           Program code
> 
> class PItem(Persistence.Persistent):
>     pass
> class PDict(PersistentMapping.PersistentMapping):
>     pass
> 
> def getQdb(fn):
>     fs = FileStorage.FileStorage(fn)
>     db = ZODB.DB(fs,pool_size=7, cache_size=400)
>     return db
> 
> def pack(db):
>     try:
>         db.pack(time.time())
>     except:
>         pass
> 
> class TestFixture:
> 
>     def setUp(self):
>         self.fn = 'TData.fs'
>         self.rrange = 100
>         self.db = getQdb(self.fn)
>         self.conn = self.db.open()
> 
>     def tearDown(self):
>         get_transaction().commit()
>         self.conn.close()
>         self.conn = None
>         pack(self.db)
>         self.db = None
> 
>     def SimpleZODBLoad(self):
>         # create container object and add to root
>         # The problem occurs with BTree but not with a PersistentMapping
>         q = BTree.BTree()
>         r = self.conn.root()
>         r['Q'] = q
>         # write loop
>         for i in range(self.rrange):
>             key = 'A'+str(i)
>             item = PItem()
>             item.data = "this is data"
>             q[key]=item
>             print "\r",i,
>         # print database size
>         qs = len(q)
>         print "\nSimpleZODBLoad Object Count:", qs
> 
>     def W_R_D_TwoConnections(self):
>         r = self.conn.root()
>         q = r['Q']
>         gconn = self.db.open() # open second connection
>         # write, read, and delete loop
>         for i in range(self.rrange):
>             print "\r",i,
>             wrkey = 'B'+str(i)
>             # make Persistent data item
>             item = PItem()
>             item.data = "this is data"
>             # commit and redo transaction loop
>             while 1:
>                 try:
>                     q[wrkey]=item
>                     get_transaction().commit()
>                 except ZODB.POSException.ConflictError:
>                     # uncommenting the following line will compensate for
>                     # but what other effects does it have?

You should add:

                      self.conn.sync()

here. Why? Because you can get a conflict error before the transaction commit.
You could get a ConflictError in the assignment above. If this happens, 
the database connection will not be automatically synchronized, because, 
concievably, the application might want to take some other action. Without
syncing the connection, you could get an infinite loop.  I did when I
ran your script (see below ;).

When you get a conflict error, you should either close and reopen your
database connection, which will synchronize, or you should explicitly
synchronize.

(Note that in the version of the software you have, the assignment wouldn't
raise a conflict error, because conflicts were only checked during commits.
Earlier, I checked for conflicts when reading state from the database. I
incorrectly removed these checks because they could lead to conflict errors
on read transactions, which seems silly.  Unfortunately, the check was necessary
to avoid reading inconsistent data and the read checks have recently been added back.

>                     continue
>                 else:
>                     break
>             # read and delete an object every fifth time through the loop
>             # using the second connection
>             if not i % 5:
>                 gq = gconn.root()['Q']
>                 while 1:
>                     try:
>                         rdkey = gq.keys()[0]
>                         item = gq[rdkey]
>                         del gq[rdkey]
>                         get_transaction().commit()
>                     except ZODB.POSException.ConflictError:

                          gconn.sync()

ditto.

>                         continue
>                     else:
>                         break
>                 # make sure I can access the retrieved item
>                 d = item.data
>         # close second connection
>         gconn.close()
>         # print database size
>         qs = len(q)
>         print "\nW_R_D_TwoConnections Object Count:", qs
> 
>     def ReadAndDelete(self):
>         r = self.conn.root()
>         q = r['Q']
>         # print database size
>         qs = len(q)
>         print "\nReadAndDelete Starting Object Count:", qs
>         # the following is rather convoluted but necessary to be able to
> iterate
>         # over all the elements in a BTree
>         items = list(q.items())
>         while items:
>             item = items.pop(0)
>             del q[item[0]]
>             get_transaction().commit()
>             # make sure I can access the retrieved item
>             d = item[1].data
>             print "\r",len(q),
>         # print database size
>         qs = len(q)
>         print "\nReadAndDelete Ending Object Count:", qs
> 
> ##--------------------------------------------------------------------------
> ##
> #                       main program entry point
> def main():
>     tf = TestFixture()
>     # do a simple batch load of objects
>     tf.setUp()
>     tf.SimpleZODBLoad()
>     tf.tearDown()
>     # write and read (and delete) objects over two connections
>     tf.setUp()
>     tf.W_R_D_TwoConnections()
>     tf.tearDown()
>     # read and delete all the objects from the database
>     tf.setUp()
>     tf.ReadAndDelete()
>     tf.tearDown()
> 
> if __name__ == "__main__":
>     main()


Jim

--
Jim Fulton           mailto:jim@digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org