[Zope-dev] [Warning] Zope/ZEO clients: subprocesses can lead tonon-deterministic message loss: would like to duplicate this

sathya sathya at zeomega.com
Sun Jun 27 17:48:00 EDT 2004


Tim Peters wrote:
just to add my 2 cents
I have been looking at zserver code, the only time fork or system (which 
i presume invokes execve ) calls are used are at startup to either
a) run a cmdline
b) daemonize
heres a snip from strace output
strace -o strace.txt -f -e trace=fork,execve ./runzope
on a zeo instance
15298 execve("./runzope", ["./runzope"], [/* 23 vars */]) = 0
15298 execve("/var/dev/vision/python233/bin/python", 
["/var/dev/vision/python233/bin/py"..., "/var/dev/vision/Zope27/lib/python
which creates a child process with id 15299

at this point asyncore main loop thread has not even started so it is 
safe to assume that the parent does not start the asyncore loop for any 
servers created but happens in the forked child . which means we 
probably cannot have multiple asyncore mainloops running

The zeoclient causes threads to be created but there are no "forks" or 
"system" calls as far as I can tell (or strace for that matter)
Can you please point out where in the zeo code does forking occur ? I 
will try and duplicate this condition.
-ty
sathya
> [Dieter Maurer]
> 
>>The problem occured in a ZEO client which called "asyncore.poll"
>>in the forked subprocess. This "poll" deterministically
>>stole ZEO server invalidation messages from the parent.
> 
> 
> I'm sorry, but this is still too vague to guess what happened.
> 
> - Which operating system was in use?
> 
> - Which thread package?
> 
> - In the ZEO client that called fork(), did it call fork() directly, or
>  indirectly as the result of a system() or popen() call?  Or what?
>  I'd like to understand a specific failure before rushing to
>  generalization.
> 
> - In the ZEO client that called fork() (whether directly or indirectly),
>  was fork called *from* the thread running ZEO's asyncore loop,
>  or from a different thread?
> 
> 
>>I read the Linux "fork" manual page and found:
>>
>> fork creates a child process that differs from the parent process
>> only in its PID and PPID, and in the fact that resource utilizations
>> are set to 0. File locks and pending signals are not inherited.
>>
>> ...
>>
>> The fork call conforms to SVr4, SVID, POSIX, X/OPEN, BSD 4.3
> 
> 
> If it conforms to POSIX (as it says it does), then fork() also has to
> satisfy the huge list of requirements I referenced before:
> 
>    http://www.opengroup.org/onlinepubs/009695399/functions/fork.html
> 
> That page is the current POSIX spec for fork().
> 
> 
>>I concluded that if the only difference is in the PID/PPID
>>and resource utilizations, there is no difference in the threads between parent
>>and child.  
> 
> 
> Except that if you're running non-POSIX LinuxThreads, a thread *is* a
> process (there's a one-to-one relationship under LinuxThreads, not the
> many-to-one relationship in POSIX), in which case "no difference in
> threads" is trivially true.
> 
> 
>>This would mean that the wide spread "asyncore.mainloop" threads could suffer
>>the same message loss and message duplication.
> 
> 
> That's why all sane <wink> threading implementations do what POSIX
> does on a fork().  fork() and threading don't really mix well under
> POSIX either, but the "fork+exec" model for starting a new process is
> an historical burden that bristles with subtle problems in a
> multithreaded world; POSIX introduced posix_spawn() and posix_spawnp()
> for sane(r) process creation, ironically moving closer to what most
> non-Unix systems have always done to create a new process.
> 
> 
>>I did not observe a message loss/duplication in any
>>application with an "asyncore.mainloop" thread.
> 
> 
> I don't understand.  You said that you *have* seen message
> loss/duplication in a ZEO client, and I assume the ZEO client was
> running an asyncore thread.  If so, then you have seen
> loss/duplication in an application with an asyncore thread.
> 
> Or are you saying that you haven't seen loss/duplication under the
> specific Linux flavor whose man page you quoted, but have seen it
> under some other (so far unidentified) system?
> 
> 
>>Maybe, the Linux "fork" manual page is only not precise with respect
>>to threads and the problem does not occur in applications
>>with a standard "asyncore.mainloop" thread.
> 
> 
> That "fork" manpage is clearly missing a mountain of crucial details
> (or it's not telling the truth about being POSIX-compliant).  fork()
> is historically poorly documented, though.
> _______________________________________________
> Zope-Dev maillist  -  Zope-Dev at zope.org
> http://mail.zope.org/mailman/listinfo/zope-dev
> **  No cross posts or HTML encoding!  **
> (Related lists - 
>  http://mail.zope.org/mailman/listinfo/zope-announce
>  http://mail.zope.org/mailman/listinfo/zope )


-- 
===================================================
CEO
ZeOmega
Open minds' Open Solutions

Plano, Texas, USA
Bangalore, India
972-731-6750 (O)
214-733-3467 (M)
http://www.zeomega.com

Open source content management and workflow solutions
====================================================


More information about the Zope-Dev mailing list