[Zope] ZEO disconnects, Zope auto restarts (via zopectl)

Dennis Allison allison at shasta.stanford.edu
Fri Feb 3 04:00:45 EST 2006

Zope 2.9.0

We are seeing spontaneous restarts of Zope with no indication in any of 
the standard Zope logs.  Looking at the ZEO log indicates that the 
restarts of Zope are due to a lost connection between Zope & ZEO but 
with no other information.  The logging level is set at the distribution 
default (INFO).

The restarts are a huge problem because session variables are not
persistent and so all of the user state they contain is lost on restart.
In our statful implementation, this is a major problem.  I want to adjust 
the configuration so that the Zope/ZEO connection is stable.
In our configuration, Zope and ZEO are linked via localhost on a
distinguished port.

I've Googled about looking for some infomation about tuning the ZEO/Zope 
interface, but have found little real information.   Some additional log
detail would be helpful.

We are running a fairly vanilla setup, excerpted below:

# ZEO client storage:
<zodb_db main>
  mount-point /
  # ZODB cache, in number of objects
  cache-size 5000
    server localhost:8301
    storage 1
    var $INSTANCE/var
    # ZEO client cache, in bytes
    cache-size 20MB
    # Uncomment to have a persistent disk cache
    client group1-zeo

  address localhost:8301
  read-only false
  invalidation-queue-size 100
  pid-filename $INSTANCE/var/ZEO.pid
  # monitor-address PORT
  # transaction-timeout SECONDS

  program $INSTANCE/bin/runzeo
  socket-name $INSTANCE/etc/zeo.zdsock
  daemon true
  forever false
  backoff-limit 10
  exit-codes 0, 2
  directory $INSTANCE
  default-to-interactive true
  # user zope
  python /usr/bin/python2.4
  zdrun /usr/local/src/zope/Zope2.9/lib64/python/zdaemon/zdrun.py

  # This logfile should match the one in the zeo.conf file.
  # It is used by zdctl's logtail command, zdrun/zdctl doesn't write it.
  logfile $INSTANCE/log/zeo.log

It's not clear what changes will lead to a more stable connection because 
it is not clear what's triggering the problem.  Any advice would be 

Presumably the shotgun approach would work -- increase the cache sizes, 
lengthen the invalidation-queue-size, and increase the backoff-limit but 
it would be nice to have some guidance.


