[Zope-dev] recipe for trapping SIGSEGV and SIGILL signals on solaris

Joseph Wayne Norton norton@alum.mit.edu
Tue, 11 Dec 2001 16:20:10 +0900


Hello.

We are facing zope restarts on the solaris 5.6 platform with zope
2.4.3 and python 2.1.1.  I put together a script based some
information on an old posting to the apache mailing list.  The
following shell/perl script allows one to get a core file from a dying
zope child process and also allow the zope to restart without any side
effects.


The script ....

#!/bin/sh
PATH=$PATH:/usr/local/bin
export PATH
cd /tmp
for PID in `ps -u zfs -f -o pid,comm,args | fgrep z2.py | cut -d' ' -f1`
do
    export PID
    truss -f -l -t\!all -S SIGSEGV,SIGILL -p $PID 2>&1 \
        | perl -pe 'system("gcore $ENV{'PID'} && sleep 5 && kill -9 $ENV{'PID'}"), exit($ENV{'PID'}) if /(SIGSEGV|SIGILL)/;' &
done


Step 1:  modify script to match your environment.

Step 2: execute script

Step 3: wait for core file to be dumped in /tmp.

Step 4: analyze with gdb where $PID is the pid of the dumped process

#bash gdb /path/to/bin/python /tmp/core.$PID 

#0  0xef5b9810 in _lwp_sema_wait ()
(gdb) where
#0  0xef5b9810 in _lwp_sema_wait ()
#1  0xef647ea0 in _park ()
#2  0xef647b84 in _swtch ()
#3  0xef6468a4 in cond_wait ()
#4  0xef6467c8 in _ti_pthread_cond_wait ()
#5  0x50220 in PyThread_acquire_lock (lock=0xd9d878, waitflag=1)
    at Python/thread_pthread.h:313
#6  0x51f18 in lock_PyThread_acquire_lock (self=0xda39b8, args=0x0)
    at ./Modules/threadmodule.c:67
#7  0x35db4 in fast_cfunction (func=0xda39b8, pp_stack=0xed40f828,
na=0)
    at Python/ceval.c:2994
#8  0x33ca0 in eval_code2 (co=0x267848, globals=0x51ec4, locals=0x0,
args=0x0, 
    argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:1951

        :
        :


It seems that we are facing trouble due to the thread library on
solaris (unless the truss command has introduced a side-effect).

Anyone else facing similiar troubles?  .... or maybe I should post
this to a python mailing list.

- joe