[ZODB-Dev] Storm/ZEO deadlocks (was Re: [Zope-dev] [announce] NEO 1.0 - scalable and redundant storage for ZODB)

Thu Aug 30 20:10:04 UTC 2012

On 30 August 2012 19:19, Shane Hathaway <shane at hathawaymix.org> wrote:
> On 08/30/2012 10:14 AM, Marius Gedminas wrote:
>>
>> On Wed, Aug 29, 2012 at 06:30:50AM -0400, Jim Fulton wrote:
>>>
>>> On Wed, Aug 29, 2012 at 2:29 AM, Marius Gedminas <marius at gedmin.as>
>>> wrote:
>>>>
>>>> On Tue, Aug 28, 2012 at 06:31:05PM +0200, Vincent Pelletier wrote:
>>>>>
>>>>> On Tue, 28 Aug 2012 16:31:20 +0200,
>>>>> Martijn Pieters <mj at zopatista.com> wrote :
>>>>>>
>>>>>> Anything else different? Did you make any performance comparisons
>>>>>> between RelStorage and NEO?
>>>>>
>>>>>
>>>>> I believe the main difference compared to all other ZODB Storage
>>>>> implementation is the finer-grained locking scheme: in all storage
>>>>> implementations I know, there is a database-level lock during the
>>>>> entire second phase of 2PC, whereas in NEO transactions are serialised
>>>>> only when they alter a common set of objects.
>>>>
>>>>
>>>> This could be a compelling point.  I've seen deadlocks in an app that
>>>> tried to use both ZEO and PostgreSQL via the Storm ORM.  (The thread
>>>> holding the ZEO commit lock was blocked waiting for the PostgreSQL
>>>> commit to finish, while the PostgreSQL server was waiting for some other
>>>> transaction to either commit or abort -- and that other transaction
>>>> couldn't proceed because it was waiting for the ZEO lock.)
>>>
>>>
>>> This sounds like an application/transaction configuration problem.
>>
>>
>> *shrug*
>>
>> Here's the code to reproduce it: http://pastie.org/4617132
>>
>>> To avoid this sort of deadlock, you need to always commit in a
>>> a consistent order.  You also need to configure ZEO (or NEO)
>>> to time-out transactions that take too long to finish the second phase.
>>
>>
>> The deadlock happens in tpc_begin() in both threads, which is the first
>> phase, AFAIU.
>>
>> AFAICS Thread #2 first performs tpc_begin() for ClientStorage and takes
>> the ZEO commit lock.  Then it enters tpc_begin() for Storm's
>> StoreDataManager and blocks waiting for a response from PostgreSQL --
>> which is delayed because the PostgreSQL server is waiting to see if
>> the other thread, Thread #1, will commit or abort _its_ transaction, which
>> is conflicting with the one from Thread #2.
>>
>> Meanwhile Thread #1 is blocked in ZODB's tpc_begin(), trying to acquire
>> the
>> ZEO commit lock held by Thread #2.
>
>
> So thread 1 acquires in this order:
>
> 1. PostgreSQL
> 2. ZEO
>
> Thread 2 acquires in this order:
>
> 1. ZEO
> 2. PostgreSQL
>
> SQL databases handle deadlocks by detecting and automatically rolling back
> transactions, while the "transaction" package expects all data managers to
> completely avoid deadlocks using the sortKey method.
>
> I haven't looked at the code, but I imagine Storm's StoreDataManager
> implements IDataManager.  I wonder if StoreDataManager provides a consistent
> sortKey.  The sortKey method must return a string (not an integer or other
> object) that is consistent yet different from all other participating data
> managers.

Storm's DataManager defines sortKey as:

    def sortKey(self):
        # Stores in TPC mode should be the last to be committed, this makes
        # it possible to have TPC behavior when there's only a single store
        # not in TPC mode.
        if self._store._tpc:
            prefix = "zz"
        else:
            prefix = "aa"
        return "%s_store_%d" % (prefix, id(self))

http://bazaar.launchpad.net/~storm/storm/trunk/view/head:/storm/zope/zstorm.py#L320

(By default self._store._tpc is set to False.)

This is essentially similar to zope.sqlalchemy's, the single phase
variant being:

105	    def sortKey(self):
106	        # Try to sort last, so that we vote last - we may commit
in tpc_vote(),
107	        # which allows Zope to roll back its transaction if the RDBMS
108	        # threw a conflict error.
109	        return "~sqlalchemy:%d" % id(self.tx)

http://zope3.pov.lt/trac/browser/zope.sqlalchemy/trunk/src/zope/sqlalchemy/datamanager.py#L105

(The TPC variant simply omits the leading tilde as it is not required
to sort last - zope.sqlalchemy commits in tpc_vote() rather than
tpc_finish() when using single phase commit.)

ZEO's sortKey is:

698	    def sortKey(self):
699	        # If the client isn't connected to anything, it can't have a
700	        # valid sortKey().  Raise an error to stop the transaction early.
701	        if self._server_addr is None:
702	            raise ClientDisconnected
703	        else:
704	            return '%s:%s' % (self._storage, self._server_addr)

http://zope3.pov.lt/trac/browser/ZODB/trunk/src/ZEO/ClientStorage.py#L698

(self._storage defaults to the string '1'.)

This should mean that ZEO always gets a sortKey like '1:./zeosock' in
the example given whereas Storm gets a sortKey like 'aa_storm_12345'
(though the final number will vary per transaction.) Which should mean
a consistent sort order and ZEO always committing first.

It seems StormDataManager only commits in tpc_finish, doing nothing in
either of commit() or tpc_vote() stages when in 1PC mode. As ZEO sorts
first a failure to commit by Storm could never abort the ZEO server's
transaction. Postgres can raise the equivalent of a ConflictError too.
That situation would lead to transaction logging a critical error and
raising the exception. The ZEO transaction would already have finished
by then

Laurence

p.s. It would be interesting to try a zope.sqlalchemy equivalent of
the storm deadlock maker to see whether it holds up any better.