[ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2

Alan Runyan runyaga at gmail.com
Thu Apr 12 13:17:26 EDT 2007


> > Storage: 1
> > Server started: Wed Apr 11 10:56:50 2007
> > Clients: 10
> > Clients verifying: 0
> > Active transactions: -1
>
> Huh? You're owing the system a transaction. However, by looking at the
> code briefly, this might happen if tpc_abort() and _abort() kind of
> overlap. And you did have two aborts at that point in time.

Sounds like a bug/race that needs to be looked into in ZEO.

> Ah. The different clients might be because you have two storages and
> your ZEO clients are configured in a way not to connect to the exactly
> same storages? Or they are but they weren't able to.
> (See hardware/network problems.)

They are both defined in zope.conf.

All 12 clients were restarted last night:

Just now I'm seeing:
Storage: 1
Server started: Wed Apr 11 10:56:50 2007
Clients: 12
Clients verifying: 0
Active transactions: -1
Commits: 92
Aborts: 2
Loads: 498120
Stores: 2279
Conflicts: 0
Conflicts resolved: 20

Storage: 2
Server started: Wed Apr 11 10:56:50 2007
Clients: 11
Clients verifying: 0
Active transactions: 0
Commits: 51
Aborts: 0
Loads: 225080
Stores: 6408
Conflicts: 0
Conflicts resolved: 167

> Something that came to my mind that might block the ZEO server for a
> long time are hard disk failures. Check your dmesg log. However, the
> network errors you see in various places really need to be tracked down.

nothing in dmesg.  I find the 'No route to host' disturbing although
these have not happened over the past 24 hours.  This has:
2007-04-12T00:17:45 ERROR ZEO.zrpc.Connection(S)
(172.16.235.120:54881) Error caught in asyncore
Traceback (most recent call last):
  File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69, in read
    obj.handle_read_event()
  File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391,
in handle_read_event
    self.handle_read()
  File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in handle_read
    d = self.recv(8192)
  File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in recv
    data = self.socket.recv(buffer_size)
error: (110, 'Connection timed out')
------

which is frustrating.  as i understand zeoserver is taking too long to
communicate to zeoclient and zeoclient times out.  shouldnt it retry /
reconnect?

alan


More information about the ZODB-Dev mailing list