[Zodb-checkins] SVN: ZODB/branches/3.4/ Worm around suspected Windows socket bug in Windows trigger code.

Tim Peters tim.one at comcast.net
Mon Aug 1 16:02:24 EDT 2005


Log message for revision 37631:
  Worm around suspected Windows socket bug in Windows trigger code.
  
  See the thread starting at
   http://mail.zope.org/pipermail/zope/2005-July/160433.html
  for gory details.
  

Changed:
  U   ZODB/branches/3.4/NEWS.txt
  U   ZODB/branches/3.4/src/ZEO/zrpc/trigger.py

-=-
Modified: ZODB/branches/3.4/NEWS.txt
===================================================================
--- ZODB/branches/3.4/NEWS.txt	2005-08-01 18:44:41 UTC (rev 37630)
+++ ZODB/branches/3.4/NEWS.txt	2005-08-01 20:02:23 UTC (rev 37631)
@@ -5,6 +5,7 @@
 Following are dates of internal releases (to support ongoing Zope 2
 development) since ZODB 3.4's last public release:
 
+- 3.4.1b2 DD-MMM-2005
 - 3.4.1b1 26-Jul-2005
 - 3.4.1a6 19-Jul-2005
 - 3.4.1a5 12-Jul-2005
@@ -106,6 +107,17 @@
   example, debugging prints added to Python's ``asyncore.loop`` won't be lost
   anymore).
 
+Windows
+-------
+
+- (3.4.1b2) As developed in a long thread starting at
+  http://mail.zope.org/pipermail/zope/2005-July/160433.html
+  there appears to be a race bug in the Microsoft Windows socket
+  implementation, rarely visible in ZEO when multiple processes try to
+  create an "asyncore trigger" simultaneously.  Windows-specific code in
+  ``ZEO/zrpc/trigger.py`` changed to work around this bug when it occurs.
+
+
 Tools
 -----
 

Modified: ZODB/branches/3.4/src/ZEO/zrpc/trigger.py
===================================================================
--- ZODB/branches/3.4/src/ZEO/zrpc/trigger.py	2005-08-01 18:44:41 UTC (rev 37630)
+++ ZODB/branches/3.4/src/ZEO/zrpc/trigger.py	2005-08-01 20:02:23 UTC (rev 37631)
@@ -1,6 +1,6 @@
 ##############################################################################
 #
-# Copyright (c) 2001, 2002 Zope Corporation and Contributors.
+# Copyright (c) 2001-2005 Zope Corporation and Contributors.
 # All Rights Reserved.
 #
 # This software is subject to the provisions of the Zope Public License,
@@ -156,27 +156,61 @@
 
         def __init__(self):
             _triggerbase.__init__(self)
+
             # Get a pair of connected sockets.  The trigger is the 'w'
             # end of the pair, which is connected to 'r'.  'r' is put
             # in the asyncore socket map.  "pulling the trigger" then
             # means writing something on w, which will wake up r.
-            a = socket.socket() # temporary, to set up the connection
+
             w = socket.socket()
-            self.trigger = w
-            # set TCP_NODELAY to true to avoid buffering
-            w.setsockopt(socket.IPPROTO_TCP, 1, 1)
+            # Disable buffering -- pulling the trigger sends 1 byte,
+            # and we want that sent immediately, to wake up asyncore's
+            # select() ASAP.
+            w.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
 
-            # Specifying port 0 tells Windows to pick a port for us.
-            a.bind(("127.0.0.1", 0))
-            connect_address = a.getsockname()  # assigned (host, port) pair
-            a.listen(1)
-            w.connect(connect_address)
+            count = 0
+            while 1:
+               count += 1
+               # Bind to a local port; for efficiency, let the OS pick
+               # a free port for us.
+               # Unfortunately, stress tests showed that we may not
+               # be able to connect to that port ("Address already in
+               # use") despite that the OS picked it.  This appears
+               # to be a race bug in the Windows socket implementation.
+               # So we loop until a connect() succeeds (almost always
+               # on the first try).  See the long thread at
+               # http://mail.zope.org/pipermail/zope/2005-July/160433.html
+               # for hideous details.
+               a = socket.socket()
+               a.bind(("127.0.0.1", 0))
+               connect_address = a.getsockname()  # assigned (host, port) pair
+               a.listen(1)
+               try:
+                   w.connect(connect_address)
+                   break    # success
+               except socket.error, detail:
+                   if detail[0] != errno.WSAEADDRINUSE:
+                       # "Address already in use" is the only error
+                       # I've seen on two WinXP Pro SP2 boxes, under
+                       # Pythons 2.3.5 and 2.4.1.
+                       raise
+                   # (10048, 'Address already in use')
+                   # assert count <= 2 # never triggered in Tim's tests
+                   if count >= 10:  # I've never seen it go above 2
+                       a.close()
+                       w.close()
+                       raise BindError("Cannot bind trigger!")
+                   # Close `a` and try again.  Note:  I originally put a short
+                   # sleep() here, but it didn't appear to help or hurt.
+                   a.close()
+
             r, addr = a.accept()  # r becomes asyncore's (self.)socket
             a.close()
+            self.trigger = w
             asyncore.dispatcher.__init__(self, r)
 
         def _close(self):
-            # self.socket is r, self.trigger is w from __init__
+            # self.socket is r, and self.trigger is w, from __init__
             self.socket.close()
             self.trigger.close()
 



More information about the Zodb-checkins mailing list