[ZODB-Dev] ClientDisconnected error on windows
Jeremy Hylton
jeremy@zope.com
Wed, 8 Aug 2001 19:01:00 -0400 (EDT)
>>>>> "JDH" == John D Heintz <jheintz@isogen.com> writes:
JDH> Thanks, hopefully with the ZCF 0.6 code you should be able to
JDH> replicate this on your own boxes as well.
Will you give me a recipe for running it to generate high load?
>>
>> The tracebacks you show seem like part of the documented behavior
>> of ZEO. You can get a ClientDisconnected error when you commit a
>> transaction. You can also get a POSException, a thread.error, or
>> a socket.error. [I'll treat your POSException suggestion
>> separately.]
JDH> Where is this documented? My ideal would be something along
JDH> the lines of BTrees.Interfaces with all exceptions declared.
JDH> My suggestion about POSException was based primarily on the
JDH> belief that *POSException* was the documentation.
Sigh. I don't know if it's documented, but Jim's answer the last time
it came up was something like "Zope catches all exceptions and retries
the transaction." In order words, we haven't documented the
exceptions carefully, nor have we been careful to keep track of what
exceptions should be raised.
I agree that this is a problem, but fixing it is a post 1.0 issue.
>>
>> It may take a moment for a client to reconnect, so under heavy
>> load I can imagine getting this error several times before the
>> client reconnects. When you get this error, do you retry the
>> transaction? If so, does it succeed (eventually)? Or is this
>> client disconnected permanently?
JDH> Umm, I think on Linux we are getting reconnected, but I'm not
JDH> so sure about on Win2k.
How can we find out what happens one way or another on Win2k? :-)
JDH> Regarding the reconnect time: The scenerio we have is: [ZEO
JDH> Server] <-> [ZEO Client / CORBA Server] <==> [many CORBA
JDH> Clients]
JDH> With this setup why would there be any waiting to reconnect?
JDH> The ZEO Server is only serving one ZEO Client and therefore
JDH> should be able to respond immediately to a reconnect request.
JDH> Right?
Sounds right to me.
JDH> Right. We can run all the processes on a single box and still
JDH> experience the problem. We have no network failures that I'm
JDH> aware of and I'm not doing anything but throw data at the CORBA
JDH> Server (single ZEO Client) fast. I would expect either
JDH> asyncore or zrpc to handle the problem more gracefully.
Agreed. This may be a bug, but I'm not sure who to blame yet.
JDH> When the socket runs out of room asyncore should block further
JDH> pushes until more room is available. Are there tunable
JDH> parameters in zrpc / asyncore / the OS to specify how much data
JDH> should be cached for a socket?
I think the OS has some tunable parameters, but I don't think that
should enter into it. The zrpc mechanism should queue things up until
asyncore (really the OS via select/poll) says the socket is ready. It
may be that asyncore and the OS aren't agreeing on what the various
error returns from socket calls are supposed to mean.
Jeremy