[Zope] Zope hiccuping

Chris Kratz chris.kratz@vistashare.com
Wed, 7 Nov 2001 17:34:04 -0500


Hmmm, I included the big M logging in my first email with two crashes.
Perhaps there is a way to get additional information?  The big M logging
only shows that two requests came in and never left the server.  The stupid
log file says that we errored out with error code 11.  It's very easy to
find these by cross referencing the postgres log file ('pq_recvbuf:
unexpected EOF on client connection') with the big M log file and the stupid
log file.

I think what brought this on was our switching to psycopg from PoPy.  We
have switched back to PoPy and the problem has cleared up though we have our
original problem of server hangs (db threads in UPDATE WAITING mode).  It's
frustrating not to have a db adaptor that doesn't have problems with
postgres.  PoPy works great except for the server stalling issue.  I am
still optimistically hoping that Psycopg will eventually be our da of
choice.  But, there were enough issues that we went back to PoPy.  Better
the beast you know...

Any other thoughts?

-Chris
------------------------------
Chris Kratz
chris.kratz@vistashare.com

----- Original Message -----
From: "Chris McDonough" <chrism@zope.com>
To: "Chris Kratz" <chris.kratz@vistashare.com>; <zope@zope.org>
Sent: Wednesday, November 07, 2001 4:47 PM
Subject: Re: [Zope] Zope hiccuping


> If it happens regularly (or better, if you can make it happen) it
> would be helpful to collect big M log data for a series of failures.
> Then attempt to find a pattern.
>
> You might find the requestprofiler.py script in "utilities" useful
> when analyzing big M log data.
>
> ----- Original Message -----
> From: "Chris Kratz" <chris.kratz@vistashare.com>
> To: <zope@zope.org>
> Sent: Wednesday, November 07, 2001 3:46 PM
> Subject: Re: [Zope] Zope hiccuping
>
>
> > OK, did a ./configure --without-py_malloc with python 2.1.1 and it
> didn't
> > take care of the problem.  Still getting:
> >
> > 2001-11-07T20:30:11 ERROR(200) zdaemon zdaemon: Wed Nov  7 15:30:11
> 2001:
> > Aiieee! 9400 exited with error code: 11
> >
> > Yes, it did compile (make clean, configure, make) and the Makefile
> has the
> > argument --without-py_malloc
> > The start script is pointed to the newly compiled python and I'm
> still
> > getting the same error.
> >
> > Aack!
> >
> > -chris
> > ------------------------------
> > Chris Kratz
> > chris.kratz@vistashare.com
> >
> > ----- Original Message -----
> > From: "Chris Kratz" <chris.kratz@vistashare.com>
> > To: "Chris Kratz" <chris.kratz@vistashare.com>
> > Sent: Wednesday, November 07, 2001 2:49 PM
> > Subject: Re: [Zope] Zope hiccuping
> >
> >
> > > After posting this, I ran into some messages posted on the mailing
> list
> > that
> > > seem to say that py_malloc is the culprit for this particular
> problem.
> > I'm
> > > in the process of trying this solution.  My apologies for the
> spam.  On
> > the
> > > other hand if there are any other thoughts, I would appreciate
> hearing
> > them
> > > as well.
> > >
> > > -Chris
> > > ------------------------------
> > > Chris Kratz
> > > chris.kratz@vistashare.com
> > >
> > >
> > > ----- Original Message -----
> > > From: "Chris Kratz" <chris.kratz@vistashare.com>
> > > To: <zope@zope.org>
> > > Sent: Wednesday, November 07, 2001 2:33 PM
> > > Subject: [Zope] Zope hiccuping
> > >
> > >
> > > > We have been noticing that periodically, we get an error from IE
> that
> > says
> > > > "Cannot find server or DNS error".  It is not easily
> reproducable
> > (except
> > > by
> > > > just clicking links on the server) and a F5 refresh in the
> browser
> > > [almost]
> > > > always loads the page correctly.  I turned on logging today with
> the -M
> > > > startup option and observed the following entries when it
> happened:
> > > >
> > > > B 145697132 2001-11-07T17:15:38 GET /OutcomeTracker/Dev_News
> > > > I 145697132 2001-11-07T17:15:38 0
> > > > A 145697132 2001-11-07T17:15:39 200 32155
> > > > E 145697132 2001-11-07T17:15:39
> > > > B 145934020 2001-11-07T17:15:40 GET
> > > > /OutcomeTracker/PeopleOrganizations/index_html
> > > > I 145934020 2001-11-07T17:15:40 0
> > > > B 135053764 2001-11-07T17:15:56 POST
> > /OutcomeTracker/Activities/index_html
> > > > I 135053764 2001-11-07T17:15:56 2831
> > > > B 146464388 2001-11-07T17:15:57 GET
> > > > /OutcomeTracker/PeopleOrganizations/index_html
> > > > I 146464388 2001-11-07T17:15:57 0
> > > > A 146464388 2001-11-07T17:16:03 200 31200
> > > > E 146464388 2001-11-07T17:16:03
> > > >
> > > > Notice how the Get
> /OutcomeTracker/PeopleOrganizations/index_html never
> > > gets
> > > > the A or E lines, but only has a B and I line.  The subsequent
> refresh
> > > > finished the request.  Interestingly, the two incompleted
> requests are
> > not
> > > > logged to z2.log.  We can see the request before and the request
> after,
> > > but
> > > > that's it.  The other strangeness is that in the postgres log,
> we see a
> > > > "pq_recvbuf: unexpected EOF on client connection".  This seemed
> to point
> > > to
> > > > zope threads dying.  Since I'm not getting anything in the
> logs(*see
> > > below),
> > > > I started running tests with one eye on the currently running
> processes.
> > > > And sure enough, whenever I got that error at the browser
> (cannot find
> > > > server...), *All* of the zope threads (except the main starter
> thread)
> > die
> > > > quietly and come back with new PIDs.  It really appears like it
> reruns
> > the
> > > > entire startup sequence again.  With Z_DEBUG_MODE on I can watch
> it go
> > > > through the startup sequence again whenever this happens.  But,
> there
> > are
> > > no
> > > > tracebacks.  It's just like somebody clicked restart in the
> middle of a
> > > > process.
> > > >
> > > > The one glimmer of hope is in the stupid log file:
> > > >
> > > > 2001-11-07T19:30:23 ERROR(200) zdaemon zdaemon: Wed Nov  7
> 14:30:23
> > 2001:
> > > > Aiieee! 1925 exited with error code: 11
> > > > ...restarting...
> > > >
> > > > Here's the questions,
> > > >
> > > > 1. It appears that something is causing those threads to crash
> (or end),
> > > but
> > > > nothing is getting put in the log file.  Is there any way to get
> the
> > > > tracebacks I assume are happening or to find out what is going
> on?
> > > > 2. Alternatively, is there a way to run zope in single threaded
> mode?
> > > > Z_DEBUG_MODE appears to only apply to the main thread because it
> goes
> > > ahead
> > > > and spawns additional threads.  If I use -t 0 I get two
> processes, but
> > no
> > > > response from a web browser request.  If I use -t 1, I get three
> > processes
> > > > owned by nobody and the original one by root.
> > > > 3. Any further ideas on how to debug this thing?  Where do I
> find what
> > > error
> > > > code 11 is?
> > > >
> > > > Thanks for you time and help,
> > > >
> > > > -Chris
> > > >
> > > > ------------------------------
> > > > Chris Kratz
> > > > chris.kratz@vistashare.com
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > Zope maillist  -  Zope@zope.org
> > > > http://lists.zope.org/mailman/listinfo/zope
> > > > **   No cross posts or HTML encoding!  **
> > > > (Related lists -
> > > >  http://lists.zope.org/mailman/listinfo/zope-announce
> > > >  http://lists.zope.org/mailman/listinfo/zope-dev )
> > > >
> > >
> > >
> >
> >
> > _______________________________________________
> > Zope maillist  -  Zope@zope.org
> > http://lists.zope.org/mailman/listinfo/zope
> > **   No cross posts or HTML encoding!  **
> > (Related lists -
> >  http://lists.zope.org/mailman/listinfo/zope-announce
> >  http://lists.zope.org/mailman/listinfo/zope-dev )
> >
>
>
> _______________________________________________
> Zope maillist  -  Zope@zope.org
> http://lists.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope-dev )
>