[Zope-dev] TCP CLOSE_WAIT leaks

Alan Milligan alan at balclutha.org
Sun Apr 2 22:05:06 EDT 2006


Dieter Maurer wrote:
> Alan Milligan wrote at 2006-3-28 10:44 +1000:

> A missing "clearing down" of the client's connection cannot be the 
> cause for this. The (worker) thread must finish (and you
> should see a log entry) long before this connection is expected to
> be closed. The problem you observe seems to indicate a problem
> inside your request processing. Activating the "debuglogger"
> (or similarly spelled) can prove whether or not this assumption
> is correct.
> 

You are correct.  It appears that all the threads are blocked on a
lock.acquire() call (possibly three given the memory addresses) :

[root at gimp tmp]# strace -fp `cat /opt/zope2.8/instance/var/Z2.pid`
Process 12814 attached with 23 threads - interrupt to quit
[pid 12788] futex(0x83560c0, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12790] futex(0x83560c0, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12798] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12802] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12803] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12806] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12808] futex(0x83560c0, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12809] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12810] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12795] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12813] futex(0x83560c0, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12807] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12804] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12800] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12797] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12812] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12796] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12805] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12814] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12799] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12792] futex(0xa993760, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 12801] futex(0x93e7f50, FUTEX_WAIT, 0, NULL <unfinished ...>


I tried to take Paul's advice to use gdb to get a python stacktrace, but
gbd wouldn't let me:

(gdb)  call PyRun_SimpleString("import sys, traceback;
sys.stderr=open('/tmp/tb','w',0); traceback.print_stack()")

Program received signal SIGSTOP, Stopped (signal).
[Switching to Thread -1469408336 (LWP 31487)]
0x00a527a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on"
Evaluation of the expression containing the function (malloc) will be
abandoned.


So, setting unwindonsignal ....


(gdb) set unwindonsignal on
(gdb)  call PyRun_SimpleString("import sys, traceback;
sys.stderr=open('/tmp/tb','w',0); traceback.print_stack()")

Program received signal SIGSTOP, Stopped (signal).
[Switching to Thread -1259328592 (LWP 31465)]
0x00a527a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
The program being debugged was signaled while in a function called from GDB.
GDB has restored the context to what it was before the call.
To change this behavior use "set unwindonsignal off"
Evaluation of the expression containing the function
(PyRun_SimpleString) will be abandoned


The best I could do was a good old-fashioned where ...


(gdb) where
#0  0x00a527a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00cb4a24 in sem_wait at GLIBC_2.0 () from /lib/tls/libpthread.so.0
#2  0x0099842c in ?? () from /usr/lib/libpython2.3.so.1.0
#3  0x0096f04e in PyThread_acquire_lock () from /usr/lib/libpython2.3.so.1.0
#4  0x009722f7 in _PyObject_GC_Del () from /usr/lib/libpython2.3.so.1.0
#5  0x0091b921 in PyCFunction_Call () from /usr/lib/libpython2.3.so.1.0
#6  0x0094e63f in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.1.0
#7  0x009501b6 in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.1.0
#8  0x0094ee6b in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.1.0
#9  0x0094fa65 in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.1.0
#10 0x0094fa65 in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.1.0
#11 0x009501b6 in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.1.0
#12 0x0090be4e in PyFunction_SetClosure () from /usr/lib/libpython2.3.so.1.0
#13 0x008f8627 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0
#14 0x008ffdb8 in PyMethod_New () from /usr/lib/libpython2.3.so.1.0
#15 0x008f8627 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0
#16 0x008f8824 in PyObject_CallMethod () from /usr/lib/libpython2.3.so.1.0
#17 0xb7b3762b in ?? () from
/opt/zope2.8/lib/python/persistent/cPersistence.so
#18 0xa67cf3cc in ?? ()
#19 0xb7b3900a in ?? () from
/opt/zope2.8/lib/python/persistent/cPersistence.so
#20 0xb7b39008 in ?? () from
/opt/zope2.8/lib/python/persistent/cPersistence.so
#21 0x9c10776c in ?? ()
#22 0xb786cce8 in ?? () from
/opt/zope2.8/lib/python/Persistence/_Persistence.so
#23 0x9c10776c in ?? ()
#24 0xa86a1968 in ?? ()
#25 0xb7b38ae4 in ?? () from
/opt/zope2.8/lib/python/persistent/cPersistence.so
#26 0xb7a83660 in ?? ()
#27 0xb7a83660 in ?? ()
#28 0xa86a19a8 in ?? ()
#29 0xb786b934 in ?? () from
/opt/zope2.8/lib/python/Persistence/_Persistence.so
#30 0x9c10776c in ?? ()
#31 0x009a4a60 in _PyWeakref_RefType () from /usr/lib/libpython2.3.so.1.0
#32 0xa86a19a8 in ?? ()
#33 0xb7a83676 in ?? ()
#34 0xb796949d in ?? () from /opt/zope2.8/lib/python/BTrees/_OOBTree.so
#35 0xa86a1a30 in ?? ()
#36 0xa86a19b0 in ?? ()
#37 0xa86a19ac in ?? ()
#38 0xb7a07583 in ?? () from
/opt/zope2.8/lib/python/Acquisition/_Acquisition.so
#39 0xb7a83660 in ?? ()
#40 0x00000000 in ?? ()


I must confess that I'm quite puzzled that the main thread is also
blocked - DeadlockDebugger's publish magic is not being invoked,
concurring with strace.

I'll investigate further in a couple of hours when I've got another
borked server - maybe quicker if Baidu comes thru ;)

Alan


More information about the Zope-Dev mailing list