[ZODB-Dev] ZEO cache problems and a lost ZEO connection

Paul Winkler pw_lists at slinkp.com
Mon Aug 1 17:17:10 EDT 2005


Not entirely sure whether this belongs here or a zope list, but
here goes...

System info on this cluster:
One ZEO server
Two ZEO clients, let's call the clients ZFred and ZJoe.
all using zope 2.7.3
 
Twice now I have observed the following pattern:

* Somebody complains to me that ZSyncer is not working
  in that it reports success, but the new data doesn't actually
  appear in pages generated by the cluster.
  (For those not familiar, ZSyncer just uses  the "Import / Export" 
  feature of Zope, except that the imported package is received via http
  POST and the data is imported from a cStringIO.StringIO() instance
  instead of the filesystem.)
  So it's as if you've imported some data and apparently succeed
  but the new data doesn't actually seem to be there.

  (ZFred is the zope server on which the new data gets imported.)

* Later the same day, *while I am looking at the zope management
  interface*, one Zope (the ZJoe client) gets stuck.  
  Responses stop coming out,
  and the debug (aka trace aka "big M") log shows that new requests
  are coming in but  (Lots of "B"s and "Is", no "A"s or "E"s).
  The CPU is mostly idle and there is plenty of free ram, so 
  presumably we are blocking on some I/O.

  In both cases I had done a bit of poking around in the management
  interface to no apparent harm. In both cases, a request to a 
  folder's manage_main was the 
  first request in the long series of "B and I but no A and E" requests. 

* After some time (both times it was around 11-13 minutes), 
  ZJoe gets unstuck and there is a flood of 
  completed requests in the debug log.
  This coincides with a series of ClientDisconnected errors
  in the zope event log (corresponding to some http 500 errors in the
  access log). (That's why I think it's a ZEO issue and decided
  to ask here.)

* The ZEO server log shows nothing at all unusual during this whole
  time ... all quiet.

* The other ZEO client, ZFred, has been up all this time and reported no
  problems.

* We still don't see the data that we imported unless I either restart
  zope or use the Control Panel to clear the in-memory ZODB cache.
  And then suddenly we see it.

So it appears that the sync succeeds, and ZFred successfully got the
ZEO server to store the changes, but both ZFred and ZJoe are using
stale cache data until I restart them or flush the cache.
I have no idea if the ZEO client disconnection is really relevant or
just a nasty coincidence or what.

Is it possibly relevant that we have significant system clock skew?
ZEO server is about 13 minutes slow, one ZEO client (ZJoe) 
is 32 minutes slow, ZFred is 16 minutes slow.

I thought it might be simple network issues, but the two ZEO clients are
on the same subnet and the admins swear they weren't monkeying with
the firewall or anything else.

-- 

Paul Winkler
http://www.slinkp.com


More information about the ZODB-Dev mailing list