[Zope-ZEO] ZODB database corruption under multiple connections

Lennard van der Feltz lvanderfeltz@knowledgetrax.com
Mon, 15 Jan 2001 18:29:51 -0700


The script shown below illustrates an issue where it appears that
transactions don't rollback completely when a ConflictError occurs in one of
two connections. As a result the database is corrupted, which becomes
evident when the script reads and deletes the objects in the database. At
that time the script stops with a KeyError as follows:

Traceback (innermost last):
  File "E:\Pyscript\MARKET~1\work\dbqu\KEYERR~2.PY", line 147, in ?
    main()
  File "E:\Pyscript\MARKET~1\work\dbqu\KEYERR~2.PY", line 143, in main
    tf.ReadAndDelete()
  File "E:\Pyscript\MARKET~1\work\dbqu\KEYERR~2.PY", line 123, in
ReadAndDelete
    d = item[1].data
  File "d:\Program Files\Python\ZODB\Connection.py", line 443, in setstate
    p, serial = self._storage.load(oid, self._version)
  File "d:\Program Files\Python\ZODB\FileStorage.py", line 591, in load
    try: return self._load(oid, version, self._index, self._file)
  File "d:\Program Files\Python\ZODB\FileStorage.py", line 567, in _load
    pos=_index[oid]
KeyError:        m

You'll see in the script that I found a crude hack that will hide the
problem, but I am not sure whether that has any side effects. I have been
able to reproduce the problem in other scripts using a linked list instead
of BTree. In addition, the problem occurs  as well when using connections in
seperate threads instead of two connections in one thread. Interestingly,
there is a different hack required to hide the problem for threaded and
non-threaded scripts. I have perused the ZODB source code but am unable to
pinpoint why this is happening.

It should be noted that while those hacks hide the problem effectively in
simple code such as this, in more complex situations (for instance when the
system is under stress and when using ZEO), they do not always work. The
problem occurs both on Win32 and Linux.

I would appreciate any insights into this. Is it a bug that should be
submitted as such, or is there another explanation? Where should I look for
the root cause? Let me know if you are interested in seeing the other
scripts.

Thanks,

Lennard van der Feltz

##--------------------------------------------------------------------------
##
#                   Import standard library modules
import sys, os, threading, time
sys.path.append(os.curdir) # append current director to the PYTHONPATH
##--------------------------------------------------------------------------
##
#                   Import non-standard library modules
import ZODB
from ZODB import FileStorage, PersistentList
import Persistence, BTree, PersistentMapping
##--------------------------------------------------------------------------
##
#                           Program code

class PItem(Persistence.Persistent):
    pass
class PDict(PersistentMapping.PersistentMapping):
    pass


def getQdb(fn):
    fs = FileStorage.FileStorage(fn)
    db = ZODB.DB(fs,pool_size=7, cache_size=400)
    return db

def pack(db):
    try:
        db.pack(time.time())
    except:
        pass


class TestFixture:

    def setUp(self):
        self.fn = 'TData.fs'
        self.rrange = 100
        self.db = getQdb(self.fn)
        self.conn = self.db.open()

    def tearDown(self):
        get_transaction().commit()
        self.conn.close()
        self.conn = None
        pack(self.db)
        self.db = None

    def SimpleZODBLoad(self):
        # create container object and add to root
        # The problem occurs with BTree but not with a PersistentMapping
object
        q = BTree.BTree()
##        q = PDict()
        r = self.conn.root()
        r['Q'] = q
        # write loop
        for i in range(self.rrange):
            key = 'A'+str(i)
            item = PItem()
            item.data = "this is data"
            q[key]=item
            print "\r",i,
        # print database size
        qs = len(q)
        print "\nSimpleZODBLoad Object Count:", qs

    def W_R_D_TwoConnections(self):
        r = self.conn.root()
        q = r['Q']
        gconn = self.db.open() # open second connection
        # write, read, and delete loop
        for i in range(self.rrange):
            print "\r",i,
            wrkey = 'B'+str(i)
            # make Persistent data item
            item = PItem()
            item.data = "this is data"
            # commit and redo transaction loop
            while 1:
                try:
                    q[wrkey]=item
                    get_transaction().commit()
                except ZODB.POSException.ConflictError:
                    # uncommenting the following line will compensate for
the problem
                    # but what other effects does it have?
##                    if item._p_oid: item._p_changed = 1
                    continue
                else:
                    break
            # read and delete an object every fifth time through the loop
            # using the second connection
            if not i % 5:
                gq = gconn.root()['Q']
                while 1:
                    try:
                        rdkey = gq.keys()[0]
                        item = gq[rdkey]
                        del gq[rdkey]
                        get_transaction().commit()
                    except ZODB.POSException.ConflictError:
                        continue
                    else:
                        break
                # make sure I can access the retrieved item
                d = item.data
        # close second connection
        gconn.close()
        # print database size
        qs = len(q)
        print "\nW_R_D_TwoConnections Object Count:", qs

    def ReadAndDelete(self):
        r = self.conn.root()
        q = r['Q']
        # print database size
        qs = len(q)
        print "\nReadAndDelete Starting Object Count:", qs
        # the following is rather convoluted but necessary to be able to
iterate
        # over all the elements in a BTree
        items = list(q.items())
        while items:
            item = items.pop(0)
            del q[item[0]]
            get_transaction().commit()
            # make sure I can access the retrieved item
            d = item[1].data
            print "\r",len(q),
        # print database size
        qs = len(q)
        print "\nReadAndDelete Ending Object Count:", qs

##--------------------------------------------------------------------------
##
#                       main program entry point
def main():
    tf = TestFixture()
    # do a simple batch load of objects
    tf.setUp()
    tf.SimpleZODBLoad()
    tf.tearDown()
    # write and read (and delete) objects over two connections
    tf.setUp()
    tf.W_R_D_TwoConnections()
    tf.tearDown()
    # read and delete all the objects from the database
    tf.setUp()
    tf.ReadAndDelete()
    tf.tearDown()

if __name__ == "__main__":
    main()