[Zope3-dev] race condition in Python2.3 asyncore

Steve Alexander steve@cat-box.net
Wed, 23 Oct 2002 15:46:53 +0000


I've been having fun dubugging a problem with Zope3 unit tests running 
on Python2.3 from CVS on intel Linux 2.4 kernel.

In this email, I explain the problem, suggest a change to the unit tests 
to prevent the problem holding up running the tests, and describe a fix 
to asyncore to fix the underlying problem.



The problem is with the FTPServer tests. Generally, the tests will pass 
and usually an error in a non-main thread will be printed to the console.

Traceback (most recent call last):
   File "/usr/local/lib/python2.3/threading.py", line 410, in __bootstrap
     self.run()
   File "/usr/local/lib/python2.3/threading.py", line 398, in run
     apply(self.__target, self.__args, self.__kwargs)
   File "testFTPServer.py", line 121, in loop
     poll(0.1, socket_map)
   File "/usr/local/lib/python2.3/asyncore.py", line 108, in poll
     r, w, e = select.select(r, w, e, timeout)
error: (9, 'Bad file descriptor')

(the line numbers are out a bit, as I've inserted a bunch of debugging 
probes)

Sometimes, the test will not complete, and will hang after printing this 
error to the console.


After much debugging, I've found a fix, to make sure the test does not 
hang, and also the cause of the problem.

The fix to ensure the test does not hang is to alter the loop() method 
from testFTPServer from this:

     def loop(self):
         while self.run_loop:
             self.counter = self.counter + 1
             # print " 'loop', self.counter
             poll(0.1, socket_map)

to this:

     def loop(self):
         import select
         from errno import EBADF
         while self.run_loop:
             self.counter = self.counter + 1
             # print " 'loop', self.counter
             try:
                 poll(0.1, socket_map)
             except select.error, data:
                 if data[0] == EBADF:
                     print "exception:", data
                 else:
                     raise


The cause of the problem seems to be a race condition around the 
socket_map global of asyncore.

If a socket is closed during the time between the socket_map being 
processed into arguments for select, and the select.select call, the 
call will raise a 'Bad file descriptor' error.

This error is only occuring in the testFTPServer test testPASS, and is 
related to this code in the FTPServerChannel class:

     def cmd_pass(self, args):
         'See Zope.Server.FTP.IFTPCommandHandler.IFTPCommandHandler'
         self.authenticated = 0
         password = args
         credentials = UsernamePassword(self.username, password)
         try:
             self.server.fs_access.authenticate(credentials)
         except Unauthorized:
             self.reply('LOGIN_MISMATCH')
             self.close_when_done()
         else:
             self.credentials = credentials
             self.authenticated = 1
             self.reply('LOGIN_SUCCESS')


The problem occurs when there is an Unauthorized error, and 
close_when_done() closes the socket.


I suggest that asyncore get patched to catch EBADF errors, and at that 
point see if the file descriptors in the current version of socket_map 
are the same as they were at the start of the method. If they are the 
same, re-raise, otherwise, ignore.

--
Steve Alexander