[Zope] EAGAIN errors crashing ZServer Aiieeee!!!!

Jon Prettyman jprettyman@acm.org
14 Mar 2000 10:33:29 -0800


Couldn't get gdb to play with Zope/Python, so I went the strace route
and guess what?

18417 select(27, [9 11 13 14 15 17 20 23 26], [], [], {30, 0}) = 1 (in [11], left {30, 0})
18417 accept(11, {sin_family=AF_INET, sin_port=htons(25342), sin_addr=inet_addr("209.154.157.3")}, [16]) = 21
18417 fcntl(21, F_GETFL)                = 0x2 (flags O_RDWR)
18417 fcntl(21, F_SETFL, O_RDWR|O_NONBLOCK) = 0
18417 gettimeofday({953058374, 704863}, NULL) = 0
18417 select(27, [9 11 13 14 15 17 20 21 23 26], [], [], {30, 0}) = 1 (in [9], left {30, 0})
18417 read(9, "x", 8192)                = 1
18417 select(27, [9 11 13 14 15 17 20 21 23 26], [20], [], {30, 0}) = 1 (out [20], left {30, 0})
18417 gettimeofday({953058374, 731513}, NULL) = 0
18417 write(5, "209.154.157.3 - - [14/Mar/2000:1"..., 173) = 173
18417 gettimeofday({953058374, 736116}, NULL) = 0
18417 write(16, "E 150812944 2000-03-14T18:26:14 "..., 33) = 33
18417 close(20)                         = 0
18417 select(27, [9 11 13 14 15 17 21 23 26], [], [], {30, 0}) = 1 (in [21], left {29, 990000})
18417 rt_sigprocmask(SIG_SETMASK, NULL, [RT_0], 8) = 0
18417 rt_sigsuspend([] <unfinished ...>
18417 --- SIGRT_0 (Real-time signal 0) ---
18417 <... rt_sigsuspend resumed> )     = -1 EINTR (Interrupted system call)
18417 sigreturn()                       = ? (mask now [])
18417 recv(21, "GET /premium/nmn/advert/tiny.gif"..., 4096, 0) = 510
18417 gettimeofday({953058374, 764510}, NULL) = 0
18417 gettimeofday({953058374, 766890}, NULL) = 0
18417 write(16, "B 140067656 2000-03-14T18:26:14 "..., 65) = 65
18417 gettimeofday({953058374, 771916}, NULL) = 0
18417 write(16, "I 140067656 2000-03-14T18:26:14 "..., 34) = 34
18417 kill(18437, SIGRT_0)              = 0
18417 select(27, [9 11 13 14 15 17 21 23 26], [], [], {30, 0}) = 1 (in [23], left {29, 610000})
18417 rt_sigprocmask(SIG_SETMASK, NULL, [RT_0], 8) = 0
18417 rt_sigsuspend([] <unfinished ...>
18417 --- SIGRT_0 (Real-time signal 0) ---
18417 <... rt_sigsuspend resumed> )     = -1 EINTR (Interrupted system call)
18417 sigreturn()                       = ? (mask now [])
18417 recv(23, "", 4096, 0)             = 0
18417 close(23)                         = 0
18417 select(27, [9 11 13 14 15 17 21 26], [], [], {30, 0}) = 1 (in [9], left {29, 600000})
18417 read(9, "x", 8192)                = 1
18417 select(27, [9 11 13 14 15 17 21 26], [21], [], {30, 0}) = 1 (out [21], left {30, 0})
18417 send(21, "HTTP/1.0 200 OK\r\nServer: Zope/Zo"..., 286, 0) = 286
18417 select(27, [9 11 13 14 15 17 21 26], [], [], {30, 0}) = 1 (in [21], left {29, 930000})
18417 rt_sigprocmask(SIG_SETMASK, NULL, [RT_0], 8) = 0
18417 rt_sigsuspend([] <unfinished ...>
18417 --- SIGRT_0 (Real-time signal 0) ---
18417 <... rt_sigsuspend resumed> )     = -1 EINTR (Interrupted system call)
18417 sigreturn()                       = ? (mask now [])
18417 kill(18437, SIGRT_0)              = 0
18417 recv(21, "", 4096, 0)             = 0
18417 close(21)                         = 0
18417 select(27, [9 11 13 14 15 17 26], [], [], {30, 0}) = 1 (in [9], left {30, 0})
18417 read(9, "x", 8192)                = 1
18417 select(27, [9 11 13 14 15 17 26], [], [], {30, 0}) = ? ERESTARTNOHAND (To be restarted)
18417 --- SIGSEGV (Segmentation fault) ---
18417 +++ killed by SIGSEGV +++

This came up after about 15 minutes of handling requests.  If anyone
wants to seem more of the strace output, let me know and I'll post it
somewhere it can be seen.

Poking around through system header files finds:
/usr/include/linux/errno.h

#ifndef _LINUX_ERRNO_H
#define _LINUX_ERRNO_H

#include <asm/errno.h>

`#ifdef __KERNEL__

/* Should never be seen by user programs */
#define ERESTARTSYS	512
#define ERESTARTNOINTR	513
#define ERESTARTNOHAND	514	/* restart if no handler.. */
#define ENOIOCTLCMD	515	/* No ioctl command */

#endif

#endif

-Jon

Michel Pelletier <michel@digicool.com> writes:

> Jon Prettyman wrote:
> > 
> > I've been reading through code trying to figure out what is going on
> > here, where this message might be coming from.  My current train of
> > thought is that the 11 exit code being seen is in z2.py is a result of
> > sys.ZServerExitCode getting set somewhere and z2.py exiting with that
> > code.
> > 
> > So I've been trying to find where code sets sys.ZServerExitCode and
> > what I've found is in ZServer.HTTPResponse.ChannelPipe.close.  In this
> > routine, the value of self._shutdown is assigned to r which then gets
> > assigned to sys.ZServerExitCode.
> > 
> > It looks like self._shutdown only gets assigned when
> > ZServer.HTTPResponse.ChannelPipe.finish gets called and a response
> > header contains an bobo-exception-type of exceptions.SystemExit.
> > 
> > So I'm guessing now that somewhere this exception is getting set but I
> > can't seem to figure out why.
>