[Zope] Re: Running more than one instance on windows often block each other

Tim Peters tim.peters at gmail.com
Fri Jul 29 13:24:34 EDT 2005


The attached hasn't failed on my box (Win XP Pro SP2, Python 2.3.5)
for about two hours, running it in 3 processes.  Was using 2 processes
before; discovered it was much easier to provoke problems using 3; but
the # of ephemeral ports in use increases too, typically hovering
between 7-8 thousand after reaching steady state.

I'll let it run the rest of today, and start changing ZODB code if it
still looks good.  I hope someone(s) else will then volunteer to port
the Windows changes to all the copies of Medusa code in the various
active Zope trunks and branches.

This suffers from what I still believe to be bugs in the Windows
socket implementation, but there is only one symptom I see with this,
and the code uses try/except to implement what appears to be a
reliable workaround.

import socket, errno
import time, random

class BindError(Exception):
    pass

def socktest29():
    w = socket.socket()

    # Disable buffering -- pulling the trigger sends 1 byte,
    # and we want that sent immediately, to wake up asyncore's
    # select() ASAP.
    w.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

    count = 0
    while 1:
        count += 1
        # Bind to a local port; for efficiency, let the OS pick
        # a free port for us.
        # Unfortunately, stress tests showed that we may not
        # be able to connect to that port ("Address already in
        # use") despite that the OS picked it.  This appears
        # to be a race bug in the Windows socket implementation.
        # So we loop until a connect() succeeds (almost always
        # on the first try).
        a = socket.socket()
        a.bind(("127.0.0.1", 0))
        print 'b',
        connect_address = a.getsockname()  # assigned (host, port) pair
        a.listen(1)
        try:
            w.connect(connect_address)
            print 'c',
            break
        except socket.error, detail:
            if detail[0] != errno.WSAEADDRINUSE:
                # "Address already in use" is the only error
                # I've seen on two WinXP Pro SP2 boxes, under
                # Pythons 2.3.5 and 2.4.1.
                raise
            # (10048, 'Address already in use')
            # assert count <= 2 # never triggered in Tim's tests
            if count >= 10:  # I've never seen it go above 2
                a.close()
                w.close()
                raise BindError("Cannot bind trigger!")
            # Close `a` and try again.  Note:  I originally put a short
            # sleep() here, but it didn't appear to help or hurt.
            print
            print detail, a.getsockname()
            a.close()
    
    r, addr = a.accept()  # r becomes asyncore's (self.)socket
    print 'a',
    a.close()
    print 'c',

    return (r, w)

sofar = []
try:
   while 1:
       print '.',
       stuff = socktest29()
       sofar.append(stuff)
       time.sleep(random.random()/10)
       if len(sofar) == 50:
           tup = sofar.pop(0)
           r, w = tup
           msg = str(random.randrange(1000000))
           w.send(msg)
           msg2 = r.recv(100)
           assert msg == msg2, (msg, msg2, r.getsockname(), w.getsockname())
           r.close()
           w.close()
except KeyboardInterrupt:
   for tup in sofar:
       for s in tup:
           s.close()


More information about the Zope mailing list