[Zope] System performance threads/proccesses & random crashes (SIGPIPE)

Doyon, Jean-Francois Jean-Francois.Doyon@CCRS.NRCan.gc.ca
Fri, 22 Mar 2002 14:45:27 -0500


Hello,

Thanks for the help!

Well, I've determined it most likely isn't PostgreSQL, since I switched t=
he
connections from socket based to TCP based, and the problem still occurs.

So, I turn my attention to FastCGI ...

I just read this on the FastCGI Website:

If an http client aborts a request before it completes, mod_fastcgi does =
too
- this results in a SIGPIPE to the FastCGI application. At a minimum,
SIGPIPE should be ignored (applications spawned by mod_fastcgi have this
setup automatically). Ideally, it should result in an early abort of the
request handling within your application and a return to the top of the
FastCGI accept() loop.

I guess Zope isn't handling the SIGPIPE the way it is suggested here?
Anyways this seems to be the most likely cause of the problems I'm having.
That AND possibly the problem Matt describes.  Matt, where can I find mor=
e
information on this, and possible solutions?

For now, I'm guessing switching to using TCP instead of sockets for FastC=
GI
connections might help solve the problem? I am getting *A LOT* of these
errors, every 5 to 10 minutes!!! And it *IS* traffic related ... when the
business day dies down, the errors stop occuring (Normal usage pattern at
this time would suupport the theory that the rrors are therefore directly
related to the amount of usage).

I'm also thinking of playing the -restart-delay option of the FastCgiServ=
er
directive ...

Help!!!

Thank you,
J.F.

-----Original Message-----
From: Chris McDonough [mailto:chrism@zope.com]
Sent: Thursday, March 21, 2002 11:08 AM
To: Doyon, Jean-Francois; zope@zope.org
Subject: Re: [Zope] System performance threads/proccesses & random
crashes (SIGPIPE)


SIGPIPE is raised by the OS when a UNIX pipe is broken in the application.
UNIX takes this exception seriously which is why it sends the signal to t=
he
process telling it "you've got a broken pipe".

As you say it started happening when you began using the database adapter=
,
it may be that some piece of the database adapter opens a pipe that is la=
ter
broken (for whatever reason, that's the $10,000 question ;-), causing the=
 OS
to send Zope a SIGPIPE.

It may be possible to install a signal handler for SIGPIPE to get rid of =
the
problem, but I'm not exactly sure what it should/would do during this
failure state, and it would be more useful to try to pin down the pipe th=
at
is getting broken by making the problem replicable.

The ZODB pool_size parameter is controlled via the pool_size argument to
ZODB.DB.DB's constructor.  It signifies how many database connections its
willing to place in the pool.  When Zope starts up, each Zope thread need=
s
to use its own database connection.  So you should likely never have a
smaller pool_size than number of threads (the -t parameter to z2.py).
Adjusting these values up and down may improve performance but there has =
to
this day not been any empirical studies as to how performance is impacted
when you do. It's probably something you need to try out in a load testin=
g
environment.  If you find something interesting, let us know! ;-)

----- Original Message -----
From: "Doyon, Jean-Francois" <Jean-Francois.Doyon@CCRS.NRCan.gc.ca>
To: <zope@zope.org>
Sent: Thursday, March 21, 2002 9:57 AM
Subject: [Zope] System performance threads/proccesses & random crashes
(SIGPIPE)


Hello,

I'm running into random crashes of my zope processes, but I'm not finding
any reference anywhere in the mailing list archives or on the site about
this specific one:

I'm getting:

2002-03-21T14:48:52 ERROR(200) zdaemon zdaemon: Thu Mar 21 09:48:52 2002:
Aiieee! 20070 exited with error code: 13

Every now and then, for now apparent reason.  signal 13 is a SIGPIPE ...

This is Zope 2.5.0 with CMF 1.2 on a severly upgraded/updated/patched RH6=
.2
... with a Python 2.1.2 built with defaults. It runs with FastCGI to Apac=
he
1.3.2x ...

Usually I just wait a couple of seconds, hit referesh in my browser and
things come back to normal, but it's still annoying, and doesn't look goo=
d
to the public.  Note that when this happens, it ususally seems to happen =
to
ALL processes.  It looks to me like the PIPE's between the master zope
process and it's children dies, and they all have to restart for some
reason. Could this be ? and if so  , why ?

Note that I started noticing this when I for the first time started using
Psycopg to create RDBMS connections to my PostgreSQL ... Could there be a
relation somehow?

On a slightly similar topic, How to I manage performance? I plan on using
Zope for a fairly high demand web site .. I noticed I can control how man=
y
processes/threads start, but then I also read somethign about the ZODB
pool_size ... What is the relation between the two exactly ?

Thank you,

Jean-Fran=E7ois Doyon
Internet Service Development and Systems Support
GeoAccess Division
Canadian Center for Remote Sensing
Natural Resources Canada
http://atlas.gc.ca
Phone: (613) 992-4902
Fax: (613) 947-2410


_______________________________________________
Zope maillist  -  Zope@zope.org
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists -
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )