[ZODB-Dev] RelStorage MySQL - StorageError: Unable to acquire commit lock

Wed Aug 12 19:57:39 EDT 2009

On Wed, Aug 12, 2009 at 8:20 PM, Shane Hathaway<shane at hathawaymix.org> wrote:
> Rudá Porto Filgueiras wrote:
>>
>> (...)
>>  Module relstorage.adapters.mysql, line 506, in start_commit
>>  Module relstorage.adapters.mysql, line 672, in _hold_commit_lock
>> StorageError: Unable to acquire commit lock
>>
>> I solve the problem restarting all instances, and the site became
>> operational again, but I have some questions:
>>
>> This can be a bug or there is any problem in my enviroment/application?
>> There is another solution to release commit lock without restart all
>> instances?
>
> Perhaps some instance was taking a very long time to finish a transaction
> commit and you didn't notice it.  RelStorage does everything it can to
> minimize the amount of time the commit lock is held (it uses a strategy
> similar to ZEO), but applications are ultimately in control of how long it
> takes to commit a transaction.
>
> A concurrent pack might provide another explanation.  Have you ever packed
> the database?

No, there is no pack operation running, only people using Plone to put content.

> This could also indicate a bug in MySQL.  According to the documentation of
> get_lock(), all locks will be released when connections terminate, but maybe
> you ran into a MySQL bug that causes locks to stick around.

Yeah, it sticks for ever. :-(
Maybe some tunning on inactive connection time in MySQL should help?

> To me, the most plausible explanation is a MySQL bug, since the other
> hypotheses don't explain why one of the connections terminated prematurely.

I suspect it's MySQL related, becouse it happen with Plone 2.5 and
Plone 3.0, RelStorage 1.1.3 and 1.2b2.

> If I were you, I would try the same application with PostgreSQL instead of
> MySQL.  If the bug persists, then at least we know it's not a MySQL bug. :-)

I first try to use PostgreSQL, but the time to convert ZODB
FileStorage was much larger than MySQL, then I decide to use MySQL.

>> But the question remain, why the database connection was not safely
>> closed when tcp_abort fail?
>
> The error message occurring in tpc_abort was "OperationalError: (2006,
> 'MySQL server has gone away')", suggesting that the database connection was
> *already closed*.

Yes, it was closed in the midle of transaction, but if you are sure
there is nothing to do with RelStorage and there is no possible or
workaround to the RelStoratge MySQL adapter to lead with MySQL bugs or
resistent do to network failures and recover from this "deadlock"
situation (or it will an ugly hack), I will move to PostgreSQL.

I also began with MySQL 5.0 from CentOS 5.0 distribution and after a
change to MySQL 5.4.1 and this failure occur on both.

Cheers,

> Shane

-- 
Rudá Porto Filgueiras
http://python-blog.blogspot.com