[Zope3-dev] How to debug zope3 if it completely hangs?

Fabio Tranchitella kobold at kobold.it
Sun Jul 1 15:34:25 EDT 2007


Hi folks,

  I have a complex setup with six Zope 3 (3.3.1, FWIW) instances on two
different machines (three instances each one), with a LVS+apache2+squid
front-end; the Zope instances are running with a ZODB root database (I
don't need ZEO, see below) and SQLOS (a RDBMS-object mapper based on
SQLObject) and they connect to a single PostgreSQL 8.2 database server on a
third machine. I use psycopg2da as database adapter. Using SQLOS and
SQLObject with can avoid the use of ZEO: the objects in the ZODB database
never change because they are just the "door" for the RDBMS-based objects.
With this set-up we are able to serve about one million of page views per
day since March.

  Now, the problem is that sometimes (about two-three times per day) some
of the Zope instances completely hang and I have to restart them. This
often happens to one or two of the instances, but sometimes it happens to
more instances. When a Zope instance hangs, it is not possible to even open
the ZMI homepage: if I try to telnet on the Zope port, I can write an HTTP
request but it hangs forever without sending me back the answer.

  I'm quite sure the problem is related to the connection to PostgreSQL,
and I don't exclude a bug in psycopg2da or sqlos (I maintain both of them),
but up to now I couldn't reproduce the problem in my testing environment.
Also, the PostgreSQL's log file doesn't contain anything really
interesting, a part of a lot of serialization errors which are handled as
Retry exceptions by the zope3's publisher and are transparent for the final
users.

  My question is easy: I don't know how to investigate the problem when my
Zope instance is not responsive anymore. I tried to use PDB, GDB and
friends but without success. Is there a good way to understand what
happens, where is zope3 looping, in order to fix the bug (if it exists,
somewhere)?

  Consider that this happens in the production environment, where the load
is quite high, but not in my testing environment stressing the instances
with ab2.

Thanks in advance,

-- 
Fabio Tranchitella                         http://www.kobold.it
Free Software Developer and Consultant     http://www.tranchitella.it
_____________________________________________________________________
1024D/7F961564, fpr 5465 6E69 E559 6466 BF3D 9F01 2BF8 EE2B 7F96 1564


More information about the Zope3-dev mailing list