[ZODB-Dev] Very Weird Behaviour :-(

Toby Dickenson tdickenson at geminidataloggers.com
Mon Jun 9 11:53:41 EDT 2003


On Monday 09 June 2003 10:22, Chris Withers wrote:

> I awoke to a customer complaint that their site wasn't responding. On going
> there, it was indeed not responding. ssh'ed to the box to find one process
> using about 1GB of memory.

Im sure you know this already, but you could have stopped this early using 
resource limits (to protect the rest of the machine from this rogue process) 
plus something like autolance (to restart the process before it hits the hard 
limit).

> Turned out to be one of the threads on the
> web-serving ZEO client.

Using any CPU time, or stalled? If it was spinning, then I would approach the 
problem by attaching strace to it.

> Zope's stop script didn't work, so I had to kill -9 one of the worker
> threads for the web client to die. (This on it's own seems to be quite a
> common pattern, why is that?)

Zope 2.6 handled shutdown signals by raising an exception. Anything that 
swallows python exceptions can block a shutdown.

2.7 will do this different..... the signal handler sets a global, which is 
checked in the main medusa loop. Im not sure if this would have helped in 
this case.

> Now for the different bit, in the logs of the web client, I found lots of
> these:
>
> 2003-06-09T08:13:56 ERROR(200) ZODB Couldn't load state for
> '\x00\x00\x00\x00\x00\x12\x1d\x92'
> Traceback (innermost last):
>    File /usr/local/zope/2.6.1_rshl/lib/python/ZODB/Connection.py, line 509,
> in setstate
>    File /usr/local/zope/2.6.1_rshl/lib/python/ZEO/ClientStorage.py, line
> 524, in load
>      (Object: Content)
>    File /usr/local/zope/2.6.1_rshl/lib/python/ZEO/ServerStub.py, line 73,
> in zeoLoad File
> /usr/local/zope/2.6.1_rshl/lib/python/ZEO/zrpc/connection.py, line 322, in
> call
> POSKeyError: 0000000000121d92
>
> Identical OIDs in all the errors...
>
> Now, I don't know if this was a symptom of the memory bloat or the cause,
> but these errors started at pretty much the same time as the server was
> first reported as being unresponsive...

Do you have a proxy in front of this zope? anything in its logs? Jamie Heilman 
posted a very effective memory-eating DOS script to zope-dev over the 
weekend.

> Interesting to note that the storage server appears to have survived
> unscathed throughout this.

nothing in the storage server logs? that oid is loadable now?

> So, I'm left rather nervous and wondering:
>
> 1. What caused the memory bloat, which apparently arrived out of the blue?
>
> 2. What those POSKeyErrors meanand should I be worried about them?
>
> Any ideas? zodb-dev seemed the right place as I suspect this is a ZODB
> problem at its base...

-- 
Toby Dickenson
http://www.geminidataloggers.com/people/tdickenson



More information about the ZODB-Dev mailing list