[ZODB-Dev] Storm/ZEO deadlocks (was Re: [Zope-dev] [announce] NEO 1.0 - scalable and redundant storage for ZODB)

Marius Gedminas marius at gedmin.as
Thu Aug 30 16:14:49 UTC 2012


On Wed, Aug 29, 2012 at 06:30:50AM -0400, Jim Fulton wrote:
> On Wed, Aug 29, 2012 at 2:29 AM, Marius Gedminas <marius at gedmin.as> wrote:
> > On Tue, Aug 28, 2012 at 06:31:05PM +0200, Vincent Pelletier wrote:
> >> On Tue, 28 Aug 2012 16:31:20 +0200,
> >> Martijn Pieters <mj at zopatista.com> wrote :
> >> > Anything else different? Did you make any performance comparisons
> >> > between RelStorage and NEO?
> >>
> >> I believe the main difference compared to all other ZODB Storage
> >> implementation is the finer-grained locking scheme: in all storage
> >> implementations I know, there is a database-level lock during the
> >> entire second phase of 2PC, whereas in NEO transactions are serialised
> >> only when they alter a common set of objects.
> >
> > This could be a compelling point.  I've seen deadlocks in an app that
> > tried to use both ZEO and PostgreSQL via the Storm ORM.  (The thread
> > holding the ZEO commit lock was blocked waiting for the PostgreSQL
> > commit to finish, while the PostgreSQL server was waiting for some other
> > transaction to either commit or abort -- and that other transaction
> > couldn't proceed because it was waiting for the ZEO lock.)
> 
> This sounds like an application/transaction configuration problem.

*shrug*

Here's the code to reproduce it: http://pastie.org/4617132

> To avoid this sort of deadlock, you need to always commit in a
> a consistent order.  You also need to configure ZEO (or NEO)
> to time-out transactions that take too long to finish the second phase.

The deadlock happens in tpc_begin() in both threads, which is the first
phase, AFAIU.

AFAICS Thread #2 first performs tpc_begin() for ClientStorage and takes
the ZEO commit lock.  Then it enters tpc_begin() for Storm's
StoreDataManager and blocks waiting for a response from PostgreSQL --
which is delayed because the PostgreSQL server is waiting to see if
the other thread, Thread #1, will commit or abort _its_ transaction, which
is conflicting with the one from Thread #2.

Meanwhile Thread #1 is blocked in ZODB's tpc_begin(), trying to acquire the
ZEO commit lock held by Thread #2.

I'm too fried right now to understand who's at fault here.

Workarounds probably exist (use RelStorage instead of ZEO?  Configure
Storm to use a lower PostgreSQL transaction isolation level?).  Maybe
this problem would go away if Storm always went into tpc_begin() before
ZEO.

I've pinged the people in #storm on FreeNode about this, but haven't
filed any bugs yet.

Marius Gedminas
-- 
Q: Wanting both frequent updates and stability/support is just wishing for a
   pony!
A: Well, we're riding our ponies to the tune of several billion page views per
   month. Where's your pony? Oh, you didn't get one?
                -- http://meta.wikimedia.org/wiki/Wikimedia_Ubuntu_migration_FAQ
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: Digital signature
URL: <http://mail.zope.org/pipermail/zodb-dev/attachments/20120830/b1405f2e/attachment.sig>


More information about the ZODB-Dev mailing list