[Zope] Running more than one instance on windows often block each other

Tim Peters tim.peters at gmail.com
Tue Jul 26 12:26:40 EDT 2005


[Sune Brøndum Wøller]
> Thanks for the pointer. I have been debugging
> select_trigger.py, and has some more info:
>
> The problem is that the call a.accept() sometimes hangs.
> Apparently a.bind(self.address) allows us to bind to
> a port that another zope instance already is bound to.
>
> The code creates the server socket a, and the client socket w,
> and gets the client socket r by connecting w to a. Then it closes a.
> a goes out of scope when __init__ terminates, and is probably garbage
> collected at some point.

Unless you're using a very old Python, `a` is collected before the
call returns (where "the call" means the call of the function in which
`a` is a local variable).  Very old Pythons had an idiotic __del__
method attached to their Windows socket wrapper, which inhibited
timely gc.

> I tried moving the code to the following standalone script, and I can reproduce
> the error with that. In the original code w is kept as an instance variable, and
> r is passed to asyncore.dispatcher.__init__  and probably kept there.

Yes, the socket bound to `r` also gets bound to `self.socket` by this call:

    asyncore.dispatcher.__init__ (self, r)

> I simulate that by returning them, then the caller of socktest can keep them
> around.
>
> I try to call socktest from different processes A and B (two pythons):
> (w,r = socktest())
> The call in A gets port 19999. The second call, in B, either blocks, or takes
> over port 19999 (I see the second process taking over the port in a port scanner.)

Sorry, I can't reproduce this -- but you didn't give a test program,
just an isolated function, and I'm not sure what you did with it.  I
called that function in an infinite loop, appending the return value
to a global list, with a short (< 0.1 second) sleep between
iterations, and closed the returned sockets fifty iterations after
they were created.  Ran that loop in two processes.  No hangs, or any
other oddities, for some minutes.  It did _eventually_ hang-- and both
processes at the same time --with netstat showing more than 4000
sockets hanging around in TIME_WAIT state then.  I assume I bashed
into some internal Windows socket resource limit there, which Windows
didn't handle gracefully.  Attaching to the processes under the MSVC 6
debugger, they were hung inside the MS socket libraries.  Repeated
this several times (everything appeared to work fine until > 4000
sockets were sitting in TIME_WAIT, and then both processes hung at
approximately the same time).

Concretely:

sofar = []
try:
    while 1:
        print '.',
        stuff = socktest()  # calling your function
        sofar.append(stuff)
        time.sleep(random.random()/10)
        if len(sofar) == 50:
            tup = sofar.pop(0)
            w, r = tup
            msg = str(random.randrange(1000000))
            w.send(msg)
            msg2 = r.recv(100)
            assert msg == msg2, (msg, msg2)
            for s in tup:
                s.close()
except KeyboardInterrupt:
    for tup in sofar:
        for s in tup:
            s.close()

Note that there's also a bit of code there to verify that the
connected sockets can communicate correctly; the `assert` never
triggered.

You haven't said which versions of Windows or Python you're using.  I
was using XP Pro SP2 and Python 2.3.5.  Don't know whether either
matters.

It was certainly the case when I ran it that your

>         print port

statement needed to display ports less than 19999 at times, meaning that the

>             a.bind((host, port))

did raise an exception at times.  It never printed a port number less
than 19997 for me.  Did you ever see it print a port number less than
19999?

> a.bind in B does not raise socket.error: (10048, 'Address already in use') as
> expected, when the server socket in A is closed, even though the port is used by
> the client socket r in A.

I'm not sure what that's saying, but could be it's an illusion.  For example,

>>> import socket
>>> s = socket.socket()
>>> s.bind(('localhost', 19999))
>>> s.listen(2)
>>> a1 = socket.socket()
>>> a2 = socket.socket()
>>> a1.connect(('localhost', 19999))
>>> a2.connect(('localhost', 19999))
>>> b1 = s.accept()
>>> b2 = s.accept()
>>> b1[0].getsockname()
('127.0.0.1', 19999)
>>> b2[0].getsockname()
('127.0.0.1', 19999)
>>>

That is, it's normal for the `r` in

>     r, addr = a.accept()

to repeat port numbers across multiple `accept()` calls, and indeed to
duplicate the port number from the `bind` call.  This always confused
me (from way back in my Unix days -- it's not "a Windows thing"), and
maybe it's not what you're talking about anyway.

> If I remove a.close(), and keep a around (by passing it to the caller), a.bind
> works as expected - it raises socket.error: (10048, 'Address already in use').

As above, I'm seeing `bind` raise exceptions regardless.

> But in the litterature on sockets, I read it should be okay to close the server
> socket and keep using the client sockets.
> 
> So, is this a possible bug in bind() ?

Sure feels that way to me, and I'm not seeing it (or don't know how to
provoke it).  But I'm not a socket expert, and am not sure I've ever
met anyone who truly was ;-)

> I have tested the new code from Tim Peters, it apparently works, ports are given
> out by windows.
> But could the same problem with bind occur here, since a is closed (and garbage
> collected) ? (far less chance for that since we do not specify port numbers, I
> know).
>
> I tried getting a pair of sockets with Tim's code, and then trying to bind a
> third socket to the same port as a/r. And I got the same problem as above.

Here I'm not sure what "the same problem" means, as you've described
more than one problem.  Do you mean that you get a hang?  Or that you
see suspiciously repeated port numbers?  Or ...?  Seeing concrete code
might help.

Last question for now:  have you seen a hang on more than one flavor
of Windows?  Thanks for digging into this!

[and Sune's code] 
> import socket, errno
> 
> class BindError(Exception):
>     pass
> 
> 
> def socktest():
>     """blabla
>     """
> 
>     address = ('127.9.9.9', 19999)
> 
>     a = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
>     w = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
> 
>     # set TCP_NODELAY to true to avoid buffering
>     w.setsockopt(socket.IPPROTO_TCP, 1, 1)
> 
>     # tricky: get a pair of connected sockets
>     host='127.0.0.1'
>     port=19999
> 
>     while 1:
>         print port
>         try:
>             a.bind((host, port))
>             break
>         except:
>             if port <= 19950:
>                 raise BindError, 'Cannot bind trigger!'
>             port=port - 1
> 
>     a.listen (1)
>     w.setblocking (0)
>     try:
>         w.connect ((host, port))
>     except:
>         pass
>     r, addr = a.accept()
>     a.close()
>     w.setblocking (1)
> 
>     #return (a, w, r)
>     return (w, r)
>     #return w


More information about the Zope mailing list