[ZODB-Dev] ZEO server leaks FDs.

Andrew Kuchling akuchlin@mems-exchange.org
Wed, 17 Oct 2001 10:51:00 -0400


On Wed, Oct 17, 2001 at 09:50:29AM -0400, Greg Ward wrote:
>I think I would agree with Anthony's "very, very bad" assessment.  This
>might explain why our ZODB-backed FastCGi script gets wedged every
>couple of days, and needs to be manually restarted to bring our web site
>back to life.  Hmmm.

It might well explain the problem; good job, Anthony!  I just ran a
little loop to consume all of the ZEO process's file descriptors:

>>> for i in range(1810):
...   s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) ; s.connect( ('ute', 1972))

And the resulting symptoms sort of match what we've been seeing on our
live site: the Quixote process reports a timeout after 30 seconds of
waiting.  Greg, did you do a similar test (beyond just replicating
Anthony's bug report)?  

What I don't understand is why this would affect kronos, though; do we
connect and disconnect from our ZEO server that often?  The
per-process FD limit is 4096, which means we can do 2048
connects/disconnects before running into trouble.  Quixote is started
once, and maybe restarted manually a few times; expire_sessions runs
nightly; add a few developers running opendb to check something or
make a fix.  That doesn't add up to 2048.  An acid test will be doing
an lsof on the ZEOD process the next time kronos hangs and seeing how
many pipes are open.



--amk