[ZODB-Dev] Storm/ZEO deadlocks (was Re: [Zope-dev] [announce] NEO 1.0 - scalable and redundant storage for ZODB)

Shane Hathaway shane at hathawaymix.org
Thu Aug 30 17:19:22 UTC 2012


On 08/30/2012 10:14 AM, Marius Gedminas wrote:
> On Wed, Aug 29, 2012 at 06:30:50AM -0400, Jim Fulton wrote:
>> On Wed, Aug 29, 2012 at 2:29 AM, Marius Gedminas <marius at gedmin.as> wrote:
>>> On Tue, Aug 28, 2012 at 06:31:05PM +0200, Vincent Pelletier wrote:
>>>> On Tue, 28 Aug 2012 16:31:20 +0200,
>>>> Martijn Pieters <mj at zopatista.com> wrote :
>>>>> Anything else different? Did you make any performance comparisons
>>>>> between RelStorage and NEO?
>>>>
>>>> I believe the main difference compared to all other ZODB Storage
>>>> implementation is the finer-grained locking scheme: in all storage
>>>> implementations I know, there is a database-level lock during the
>>>> entire second phase of 2PC, whereas in NEO transactions are serialised
>>>> only when they alter a common set of objects.
>>>
>>> This could be a compelling point.  I've seen deadlocks in an app that
>>> tried to use both ZEO and PostgreSQL via the Storm ORM.  (The thread
>>> holding the ZEO commit lock was blocked waiting for the PostgreSQL
>>> commit to finish, while the PostgreSQL server was waiting for some other
>>> transaction to either commit or abort -- and that other transaction
>>> couldn't proceed because it was waiting for the ZEO lock.)
>>
>> This sounds like an application/transaction configuration problem.
>
> *shrug*
>
> Here's the code to reproduce it: http://pastie.org/4617132
>
>> To avoid this sort of deadlock, you need to always commit in a
>> a consistent order.  You also need to configure ZEO (or NEO)
>> to time-out transactions that take too long to finish the second phase.
>
> The deadlock happens in tpc_begin() in both threads, which is the first
> phase, AFAIU.
>
> AFAICS Thread #2 first performs tpc_begin() for ClientStorage and takes
> the ZEO commit lock.  Then it enters tpc_begin() for Storm's
> StoreDataManager and blocks waiting for a response from PostgreSQL --
> which is delayed because the PostgreSQL server is waiting to see if
> the other thread, Thread #1, will commit or abort _its_ transaction, which
> is conflicting with the one from Thread #2.
>
> Meanwhile Thread #1 is blocked in ZODB's tpc_begin(), trying to acquire the
> ZEO commit lock held by Thread #2.

So thread 1 acquires in this order:

1. PostgreSQL
2. ZEO

Thread 2 acquires in this order:

1. ZEO
2. PostgreSQL

SQL databases handle deadlocks by detecting and automatically rolling 
back transactions, while the "transaction" package expects all data 
managers to completely avoid deadlocks using the sortKey method.

I haven't looked at the code, but I imagine Storm's StoreDataManager 
implements IDataManager.  I wonder if StoreDataManager provides a 
consistent sortKey.  The sortKey method must return a string (not an 
integer or other object) that is consistent yet different from all other 
participating data managers.

Shane



More information about the ZODB-Dev mailing list