Success!!! Re: [ZODB-Dev] ZODB Deadlock issue python frame trace

John D. Heintz jheintz@isogen.com
Mon, 21 May 2001 11:21:50 -0500


Jeremy,

That looks to have done the trick!  I re-ran my tests and didn't get a si=
ngle=20
lock up.  Thanks for all the hard work!

John

On Friday 18 May 2001 11:23, Jeremy Hylton wrote:
> John,
>
> I've got a(nother) potential fix for the problem you discovered.  I'm
> much happier with this fix, because I think it solves the problem
> pretty much the right way.
>
> The cause of the bug (can't remember if I explained it on the list or
> in private email) is that tpc_finish() acquires the storage and db
> locks in a different order than DB.open().  As a result, if each
> starts at the same time and gets one of the two locks it needs, the
> system will be deadlocked.
>
> For a deadlock problem like this, there seem to be two a couple of
> simple strategies we can use to fix it.  One is to see if the locks
> cover more data than they need to; if they do, we can use two
> different locks that each cover less data.  Unfortunately, that
> approach doesn't work here.  The DB lock that is held during open()
> and during the invalidation phase of tpc_finish() are protecting
> exactly the same data structures.
>
> The other simple strategy is to enforce a consistent locking order.
> If a thread is going to hold the DB lock and the storage lock, it MUST
> acquire the DB lock first.  That's what my proposed solution does.
>
> The DB object gets methods called begin_invalidation() and
> finish_invalidation() that acquire and release the DB lock
> respectively.  Before the Connection calls tpc_finish() on the
> storage, it calls begin_invalidation().  This guarantees that the DB
> acquired before the storage lock.
>
> When the invalidation phase is over, the Connection calls
> end_invalidation() to release the DB lock.  This is an optimization.
> It could wait until tpc_finish() returns, but we know that the DB will
> not be used again for the rest of the tpc_finish() and tpc_finish()
> could take a long time.
>
> There is a patch below.  It passes that FileStorage test suite, but
> that doesn't really test the locking behavior.  Does it solve your
> problem?
>
> Jeremy
>
> Index: Connection.py
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /cvs-repository/Zope2/lib/python/ZODB/Connection.py,v
> retrieving revision 1.53
> diff -c -c -r1.53 Connection.py
> *** Connection.py=092001/05/17 18:35:10=091.53
> --- Connection.py=092001/05/18 16:06:16
> ***************
> *** 683,700 ****
>       def tpc_finish(self, transaction):
>
>           # It's important that the storage call the function we pass
> !         # (self.tpc_finish_) while it still has it's lock.  We don't
> !         # want another thread to be able to read any updated data
> !         # until we've had a chance to send an invalidation message to
> !         # all of the other connections!
>
>           if self._tmp is not None:
>               # Commiting a subtransaction!
>               # There is no need to invalidate anything.
> !             self._storage.tpc_finish(transaction, self._invalidate_su=
b)
>               self._storage._creating[:0]=3Dself._creating
>               del self._creating[:]
>           else:
>               self._storage.tpc_finish(transaction,
>                                        self._invalidate_invalidating)
>
> --- 656,674 ----
>       def tpc_finish(self, transaction):
>
>           # It's important that the storage call the function we pass
> !         # (self._invalidate_invalidating) while it still has it's
> !         # lock.  We don't want another thread to be able to read any
> !         # updated data until we've had a chance to send an
> !         # invalidation message to all of the other connections!
>
>           if self._tmp is not None:
>               # Commiting a subtransaction!
>               # There is no need to invalidate anything.
> !             self._storage.tpc_finish(transaction)
>               self._storage._creating[:0]=3Dself._creating
>               del self._creating[:]
>           else:
> +             self._db.begin_invalidation()
>               self._storage.tpc_finish(transaction,
>                                        self._invalidate_invalidating)
>
> ***************
> *** 703,715 ****
>
>       def _invalidate_invalidating(self):
>           invalidate=3Dself._db.invalidate
> !         for oid in self._invalidating: invalidate(oid, self)
> !
> !     def _invalidate_sub(self):
> !         # There's no point in invalidating any objects in a
> subtransaction !         #
> !         # Because we may ultimately abort the containing transaction.
> !         pass
>
>       def sync(self):
>           get_transaction().abort()
> --- 677,685 ----
>
>       def _invalidate_invalidating(self):
>           invalidate=3Dself._db.invalidate
> !         for oid in self._invalidating:
> !             invalidate(oid, self)
> !         self._db.finish_invalidation()
>
>       def sync(self):
>           get_transaction().abort()
>
> Index: DB.py
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /cvs-repository/Zope2/lib/python/ZODB/DB.py,v
> retrieving revision 1.29
> diff -c -c -r1.29 DB.py
> *** DB.py=092001/05/17 18:35:10=091.29
> --- DB.py=092001/05/18 16:07:08
> ***************
> *** 311,316 ****
> --- 311,326 ----
>       def importFile(self, file):
>           raise 'Not yet implemented'
>
> +     def begin_invalidation(self):
> +         # Must be called before first call to invalidate and before
> +         # the storage lock is held.
> +         self._a()
> +
> +     def finish_invalidation(self):
> +         # Must be called after begin_invalidation() and after final
> +         # invalidate() call.
> +         self._r()
> +
>       def invalidate(self, oid, connection=3DNone, version=3D'',
>                      rc=3Dsys.getrefcount):
>           """Invalidate references to a given oid.
> ***************
> *** 320,357 ****
>           passed in to prevent useless (but harmless) messages to the
>           connection.
>           """
> !         if connection is not None: version=3Dconnection._version
> !         self._a()
> !         try:
>
> !             # Update modified in version cache
> !             h=3Dhash(oid)%131
> !             cache=3Dself._miv_cache
> !             o=3Dcache.get(h, None)
> !             if o and o[0]=3D=3Doid: del cache[h]
> !
> !             # Notify connections
> !             pools,pooll=3Dself._pools
> !             for pool, allocated in pooll:
> !                 for cc in allocated:
>                       if (cc is not connection and
>                           (not version or cc._version=3D=3Dversion)):
> -                         if rc(cc) <=3D 3:
> -                             cc.close()
>                           cc.invalidate(oid)
> !
> !             temps=3Dself._temps
> !             if temps:
> !                 t=3D[]
> !                 for cc in temps:
> !                     if rc(cc) > 3:
> !                         if (cc is not connection and
> !                             (not version or cc._version=3D=3Dversion)=
):
> !                             cc.invalidate(oid)
> !                         t.append(cc)
> !                     else: cc.close()
> !                 self._temps=3Dt
> !         finally: self._r()
>
>       def invalidateMany(self, oids=3DNone, version=3D''):
>           if oids is None: self.invalidate(None, version=3Dversion)
> --- 330,362 ----
>           passed in to prevent useless (but harmless) messages to the
>           connection.
>           """
> !         if connection is not None:
> !             version=3Dconnection._version
> !         # Update modified in version cache
> !         h=3Dhash(oid)%131
> !         o=3Dself._miv_cache.get(h, None)
> !         if o is not None and o[0]=3D=3Doid: del self._miv_cache[h]
>
> !         # Notify connections
> !         for pool, allocated in self._pools[1]:
> !             for cc in allocated:
> !                 if (cc is not connection and
> !                     (not version or cc._version=3D=3Dversion)):
> !                     if rc(cc) <=3D 3:
> !                         cc.close()
> !                     cc.invalidate(oid)
> !
> !         temps=3Dself._temps
> !         if temps:
> !             t=3D[]
> !             for cc in temps:
> !                 if rc(cc) > 3:
>                       if (cc is not connection and
>                           (not version or cc._version=3D=3Dversion)):
>                           cc.invalidate(oid)
> !                     t.append(cc)
> !                 else: cc.close()
> !             self._temps=3Dt
>
>       def invalidateMany(self, oids=3DNone, version=3D''):
>           if oids is None: self.invalidate(None, version=3Dversion)
>
>
> _______________________________________________
> For more information about ZODB, see the ZODB Wiki:
> http://www.zope.org/Wikis/ZODB/
>
> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zodb-dev

--=20
=2E . . . . . . . . . . . . . . . . . . . . . . .

John D. Heintz | Senior Engineer

1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.633.1198 | jheintz@isogen.com

w w w . d a t a c h a n n e l . c o m