[Zope-dev] RE: [ZODB-Dev] [Warning] Zope/ZEO clients: subprocesses can lead tonon-deterministic message loss

Dieter Maurer dieter at handshake.de
Mon Jun 28 15:36:10 EDT 2004


Hi Tim,

Tim Peters wrote at 2004-6-27 17:06 -0400:
>[Dieter Maurer]
>> The problem occured in a ZEO client which called "asyncore.poll"
>> in the forked subprocess. This "poll" deterministically
>> stole ZEO server invalidation messages from the parent.
>
>I'm sorry, but this is still too vague to guess what happened.

Even when I sometimes make errors, my responses usually contain
all relevant information.

>- Which operating system was in use?

The ZEO client application mentioned above is almost independent
of the operating system -- beside the fact, that is uses
"fork" (and therefore requires the OS to support it).

Therefore, I did not mention that the application was running
on Linux 2.

>- Which thread package?

The application mentioned above does not use any thread.
Therefore, it is independent of the thread package.
Would it use threads it were "LinuxThreads" (but it does not).


There is no mystery at all that the application lost ZEO server
invalidation messages. It directly follows from the fork
semantics with respect to file descriptors.


The problem I saw for wider Zope/ZEO client usage came alone
from reading the Linux "fork" manual page which indicates
(or at least can be interpreted) that child and parent have the same threads.
There was no concrete observation that messages are lost/duplicated
in this szenario.


Meanwhile, I checked that "fork" under Linux with LinuxThreads
behaves with respect to threads as dictated by the POSIX
standard: the forked process has a single thread and
does not inherit other threads from its parent.

I will soon check how our Solaris version of Python behaves.
If this, too, has only one thread, I will apologize for
the premature warning...


>- In the ZEO client that called fork(), did it call fork() directly, or
> indirectly as the result of a system() or popen() call?  Or what?
> I'd like to understand a specific failure before rushing to
> generalization.

The ZEO client as the basic structure:

    while 1:
	  work_to_do = get_work(...)
	  for work in work_to_do:
	      pid = fork()
	      if pid == 0:
		 do_work(work)
		 # will not return
	  sleep(...)

"do_work" opens a new ZEO connection.
"get_work" and "do_work" use "asyncore.poll" to
synchronize with incoming messages from ZEO -- no "asyncore.mainloop"
around.
The "poll" in "do_work" has stolen ZEO invalidation messages
destined for the parent such that "get_work" has read old state
and returned work items already completed. That is the problem
I saw.

All this is easy to understand, (almost) platform independent
and independant of the thread library.


*Iff* a thread library lets a forked child inherit all threads
then the problem I announced in this "Warning" thread can
occur, as it then behaves similarly to my application
above (with an automatic rather than a explicit "poll").

It may well be that there is no thread library that does this.
In your words: all thread implementations may be "sane"
with respect to thread inheritance...

-- 
Dieter


More information about the Zope-Dev mailing list