[ZODB-Dev] ClientDisconnected error on windows

Jeremy Hylton jeremy@zope.com
Wed, 8 Aug 2001 19:01:00 -0400 (EDT)


>>>>> "JDH" == John D Heintz <jheintz@isogen.com> writes:

  JDH> Thanks, hopefully with the ZCF 0.6 code you should be able to
  JDH> replicate this on your own boxes as well.

Will you give me a recipe for running it to generate high load?

  >>
  >> The tracebacks you show seem like part of the documented behavior
  >> of ZEO.  You can get a ClientDisconnected error when you commit a
  >> transaction.  You can also get a POSException, a thread.error, or
  >> a socket.error.  [I'll treat your POSException suggestion
  >> separately.]

  JDH> Where is this documented?  My ideal would be something along
  JDH> the lines of BTrees.Interfaces with all exceptions declared.
  JDH> My suggestion about POSException was based primarily on the
  JDH> belief that *POSException* was the documentation.

Sigh.  I don't know if it's documented, but Jim's answer the last time
it came up was something like "Zope catches all exceptions and retries
the transaction."  In order words, we haven't documented the
exceptions carefully, nor have we been careful to keep track of what
exceptions should be raised.

I agree that this is a problem, but fixing it is a post 1.0 issue.

  >>
  >> It may take a moment for a client to reconnect, so under heavy
  >> load I can imagine getting this error several times before the
  >> client reconnects.  When you get this error, do you retry the
  >> transaction?  If so, does it succeed (eventually)?  Or is this
  >> client disconnected permanently?

  JDH> Umm, I think on Linux we are getting reconnected, but I'm not
  JDH> so sure about on Win2k.

How can we find out what happens one way or another on Win2k? :-)

  JDH> Regarding the reconnect time: The scenerio we have is: [ZEO
  JDH> Server] <-> [ZEO Client / CORBA Server] <==> [many CORBA
  JDH> Clients]

  JDH> With this setup why would there be any waiting to reconnect?
  JDH> The ZEO Server is only serving one ZEO Client and therefore
  JDH> should be able to respond immediately to a reconnect request.
  JDH> Right?

Sounds right to me.

  JDH> Right.  We can run all the processes on a single box and still
  JDH> experience the problem. We have no network failures that I'm
  JDH> aware of and I'm not doing anything but throw data at the CORBA
  JDH> Server (single ZEO Client) fast.  I would expect either
  JDH> asyncore or zrpc to handle the problem more gracefully.

Agreed.  This may be a bug, but I'm not sure who to blame yet.

  JDH> When the socket runs out of room asyncore should block further
  JDH> pushes until more room is available.  Are there tunable
  JDH> parameters in zrpc / asyncore / the OS to specify how much data
  JDH> should be cached for a socket?

I think the OS has some tunable parameters, but I don't think that
should enter into it.  The zrpc mechanism should queue things up until
asyncore (really the OS via select/poll) says the socket is ready.  It
may be that asyncore and the OS aren't agreeing on what the various
error returns from socket calls are supposed to mean.

Jeremy