[ZODB-Dev] ClientDisconnected error on windows

JohnD.Heintz JohnD.Heintz
Wed, 8 Aug 2001 18:50:07 -0500


On Wednesday 08 August 2001 18:01, Jeremy Hylton wrote:
> >>>>> "JDH" =3D=3D John D Heintz <jheintz@isogen.com> writes:
>
>   JDH> Thanks, hopefully with the ZCF 0.6 code you should be able to
>   JDH> replicate this on your own boxes as well.
>
> Will you give me a recipe for running it to generate high load?

Sure, the multiThreadTest.py is the ticket.  Right now it is pretty limit=
ed:=20
it must be run in the foreground and multiple times to hit the CORBA Serv=
er=20
mith multiple client threads.

In the directory with the ZCF files run the following in one shell:
python startZeo.py &
python startServer.py zeo

This will start the ZEO and CORBA servers. =A0This also writes out the=20
SampleServer.ior file used by clients.

Now, execute in multiple other shells:
python multiThreadTest.py

This doesn't like being run in the background right now, sorry. =A0It has=
 a=20
raw_input() call so just hit enter to kill them.

We can run more of these multiThreadTest.py processes against Linux than =
we=20
can against Windows.

>
>   >> The tracebacks you show seem like part of the documented behavior
>   >> of ZEO.  You can get a ClientDisconnected error when you commit a
>   >> transaction.  You can also get a POSException, a thread.error, or
>   >> a socket.error.  [I'll treat your POSException suggestion
>   >> separately.]
>
>   JDH> Where is this documented?  My ideal would be something along
>   JDH> the lines of BTrees.Interfaces with all exceptions declared.
>   JDH> My suggestion about POSException was based primarily on the
>   JDH> belief that *POSException* was the documentation.
>
> Sigh.  I don't know if it's documented, but Jim's answer the last time
> it came up was something like "Zope catches all exceptions and retries
> the transaction."  In order words, we haven't documented the
> exceptions carefully, nor have we been careful to keep track of what
> exceptions should be raised.
>
> I agree that this is a problem, but fixing it is a post 1.0 issue.

Hmm.  I don't like the "catches all exceptions and retries the transactio=
n"=20
bit.  I would much rather have a category of Exception that signal a retr=
y=20
and all others are explicity raised.  This might be me just prematurely=20
optimizing, but I don't like wasting server resources retrying transactio=
ns=20
that are always failures.

>
>   >> It may take a moment for a client to reconnect, so under heavy
>   >> load I can imagine getting this error several times before the
>   >> client reconnects.  When you get this error, do you retry the
>   >> transaction?  If so, does it succeed (eventually)?  Or is this
>   >> client disconnected permanently?
>
>   JDH> Umm, I think on Linux we are getting reconnected, but I'm not
>   JDH> so sure about on Win2k.
>
> How can we find out what happens one way or another on Win2k? :-)

Would looking at the log file for the ZEO Server or ZEO Client be appropr=
iate=20
to easily find that out?  I can dump logging info wherever we need it, I'=
ve=20
avoided it so far because I'm already feeling a little infomation overloa=
d.

Let me know where and what logging you want and we can get it though.

>
>   JDH> Regarding the reconnect time: The scenerio we have is: [ZEO
>   JDH> Server] <-> [ZEO Client / CORBA Server] <=3D=3D> [many CORBA
>   JDH> Clients]
>
>   JDH> With this setup why would there be any waiting to reconnect?
>   JDH> The ZEO Server is only serving one ZEO Client and therefore
>   JDH> should be able to respond immediately to a reconnect request.
>   JDH> Right?
>
> Sounds right to me.
>
>   JDH> Right.  We can run all the processes on a single box and still
>   JDH> experience the problem. We have no network failures that I'm
>   JDH> aware of and I'm not doing anything but throw data at the CORBA
>   JDH> Server (single ZEO Client) fast.  I would expect either
>   JDH> asyncore or zrpc to handle the problem more gracefully.
>
> Agreed.  This may be a bug, but I'm not sure who to blame yet.

We're not into blame here, just getting to the right solution.  ;-)

>
>   JDH> When the socket runs out of room asyncore should block further
>   JDH> pushes until more room is available.  Are there tunable
>   JDH> parameters in zrpc / asyncore / the OS to specify how much data
>   JDH> should be cached for a socket?
>
> I think the OS has some tunable parameters, but I don't think that
> should enter into it.  The zrpc mechanism should queue things up until
> asyncore (really the OS via select/poll) says the socket is ready.  It
> may be that asyncore and the OS aren't agreeing on what the various
> error returns from socket calls are supposed to mean.
>
> Jeremy
>
>
> _______________________________________________
> For more information about ZODB, see the ZODB Wiki:
> http://www.zope.org/Wikis/ZODB/
>
> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zodb-dev

--=20
=2E . . . . . . . . . . . . . . . . . . . . . . .

John D. Heintz | Senior Engineer

1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.633.1198 | jheintz@isogen.com

w w w . d a t a c h a n n e l . c o m