[ZODB-Dev] ZEO client leaking memory?

Jeremy Hylton jeremy@zope.com
Thu, 4 Oct 2001 13:06:03 -0400 (EDT)


>>>>> "CW" == Chris Withers <chrisw@nipltd.com> writes:

  CW> Hi, Just to let you guys know, I've noticed that my
  CW> lets-index-30,000-documents ZEO client appears to be leaking
  CW> memory. After doing about 10K docs, the ZEO _client_ process has
  CW> sucked so much memory that the machien churns to a painfully
  CW> slow statis...

We're seeing a very similar problem with an internal project that uses
ZEO.  The client-side consumes so much memory that the server side
fails with a MemoryError.  We haven't made much progress debugging,
because it has been possible to simply restart the job after a crash
and continue.  :-(.

It's hard to tell if this is a problem with ZEO or not.  If the client
is leaking memory, the leak could be in any of several places: in the
ClientStorage, in any of the Zope code used by the client, or in the
applications themselves.

The logging issues, incidentally, was for blather-level logging.  If
you crank the zLOG severity to its most verbose, ZEO will log every
message it sends.  Unfortunately, you pay some performance penalty for
this ability whether you enable that level of logging or not.  ZEO
must format strings and make zLOG calls, and it isn't until you
actually get into the zLOG implementation that you check the log level
and find out the message will get tossed.  Fortunately, we'ved wrapped
all these message-level log calls in "if __debug__:" tests.  If you
run python -O, you pay no penalty.  (The compiler knows statically
that __debug__ is false and doesn't generate any code for the if
statement.)

So python -O makes a lot of sense in a production ZEO environment.
The only thing you lose with python -O is the ability to run the
Python debugger in that interpreter.

  CW> This, of course, could be a myriad of different things. Anyone
  CW> got any clues on how I can find out what's going on?

[More on this later.]  Jim noted that a number of the cache related
calls take ages and do nothing when the age is 0.  They also work at
a granularity of three seconds, which means anything less than three
seconds is treated as zero.  I'll try to dig into that issue this
afternoon and have a better explanation.

Jeremy