[ZODB-Dev] ZEO Client locked in tpc_begin

Thierry Delprat thierry.delprat at unilog.fr
Wed Oct 6 12:06:45 EDT 2004


We experience a complete Zeo Client freeze in a production environment: 
=> all ZServer threads are busy, CPU is idle.

We use :
	- Zope 2.7.2 / python 2.3.3 
	- a CMF site 
	- 1 Mount Point (for the CMF Catalog)
	- 1 ZEO Client, 1 ZSS

After adding a lot of logs, it seems that the freeze occurs in the tpc_begin
of the ClientStorage :
=> the thread enters the "while self._transaction is not None:" loop and
never exists.
As this thread has acquired a global lock (_tpc_cond), all other threads
trying to commit are also locked.


This problem seems to be related with the fact that we encounter a lot of
"Shouldn't load state" errors.
When this errors occurs there is a chance the current transaction is not
released, and the next time we try a transaction on the ClientStorage we
enter the infinite loop waiting for the last transaction to complete.

We succed in reproducing this problem in a test environment:
===========================================================================
Configuration:
    1 zope server with 5 mounts points 
	(1 is Temporary Folder for session)
    1 zeo server with 4 storages 
	(default storage)
    1 External method which create randomly 1 or 4 objects 
	(OFSFolder distributed randomly in different mount point)
    1  multithreaded script which call the external method 
	((x simultaneous thread)*y series)

for each (x simultaneous thread) session we wait result before to launch
another session

All storage implement sortKey, and the order of jars=_get_jars (in
Transaction.py) is respected during all the process in our trace.

We observed the following behaviour :
    we have some "shouldn't load state" from Connection.py line 545
    or NoneType has no attributes tpc_begin
    or NoneType has no attributes tpc_abort
    or NoneType has no attributes tpc_finish

Connection seems to be closed or _storage set to None by another thread.

also if a connection pass the tpc_begin with success, set the _transaction
of ClientStorage and crash during the tpc_abort (_transaction not set to
None), the other threads wait indefinitely in tpc_begin of ClientStorage
(_transaction is not None and _transaction!=txn).

This error arrives not really often, we can launch a lot of sessions without
any result, however when we use threadframe to monitor Zpublisher Threads,
we obtain more easily the problem.

========================================================================

Any help on this subject would be greatly appreciated...

Thierry





More information about the ZODB-Dev mailing list