[ZODB-Dev] [Problem] strange state after SIGSEGV

Mon Mar 22 11:03:32 EST 2004

Dieter Maurer wrote:
> This problem report is for Zope 2.7.0, Python 2.3.3, Linux 2.4.19.
> 
> After an application provoked SIGSEGV (caused by a C runtime stack overflow),
> my Zope process entered a strange (and unhealthy) state:
> 
>   Zope did not die completely (as it should have done) but only partially:
>   One of the threads had disappeared, the others where in
>   the following state:
> 
>     *  their parent pid has been set to "1"
> 
>     *  attaching with "GDB" was only allowed as "root"
> 
>     *  at least two of the three remaining processes were waiting in "accept"
> 
>     *  they would not die on SIGTERM but only SIGKILL
> 
>   Consequences:
> 
>     *  Zope did no longer respond to requests
> 
>     *  "stop" did not work (as "SIGTERM" was ineffective)
> 
>     *  "start" did not work, as the dangling processes kept
>        the HTTP port bound.
> 
> 
> Anyone with some understanding what can cause such a strange state?

While developing, this happens all the time for me.  The most reliable 
way to get there is to Ctrl-C out of a 'pdb' session.

I can explain some of it.  Python threads other than the main thread set 
a mask that blocks most signals, but SIGKILL (9) can't be blocked.  You 
can find out the signal mask for a process by looking at the SigBlk line 
of /proc/(process_id)/status.  I think Python freezes because a lock 
held by the dead thread never gets released--perhaps the storage's 
commit lock.  The parent pid and gdb issues could be normal for Python 
threads.

Shane