[Zope] Zope Zeo Performance

Fri Mar 14 15:05:46 EDT 2008

FuBuJo wrote at 2008-3-14 13:31 +0000:

You need to be a bit more careful in your description.

For example the diagram "Apache -> Zeo -> Zope(ZODB)" is
very confusing. It is very rare that Apache speaks to Zeo.

The confusion between Zope and Zeo may go straight through your
description such that it is often unclear whether you really
mean Zope when you write Zope and Zeo when you write Zeo.

More below.

> ...
>The traffic is heavy write traffic (I read some of Dieters posts and am testing
>that out as well). Once overall load hits about 100 people or so the Zeo's start
>dying

Here again, you use a wrong word: "dying" would mean that your ZEO
process terminates but below to say that it gets slower.

>- heavy load, slow response, python takes all CPU/Memory.

Which "python"? The "python" executing Zeo? Or the one executing Zope?

> Then when
>traffic is removed from the ZEO instance ... the system remains CPU bound by the
>python process ... and you have to bounce Zope(Zeo instance) and Apache to free
>it.

Which system? The one running ZEO (the ZEO server) or the one running
Zope?

>The ZODB reports heavy Clients waiting ... but doesn't budge on load.
You see this in the ZEO logfile?
Then, it is ZEO which reports the waiting -- not the ZODB.

>So ... anyone have any suggestions.

We are having similar problems -- I call them commit congestions.

As far as we understand it by now, it is a multiple cause problem.
Commit congestions can be caused on the client (=Zope) side and on the
server (=ZEO) side.

A client drastically increases the probability for commit congestions
when he does expensive things while he helds the commit lock, i.e.
during the second phase of the two phase commit protocol.
We have identified three causes:

  *  garbage collections

     During a garbage collection the garbage collector holds
     the GIL and blocks all Python activity.
     We found that a single generation 2 (i.e. full) garbage
     collection can take between 10 and 20 s.
     We had a bad text index implementation
     that caused excessive object creation and thereby lots
     of garbage collections.

     Our measure has been to drop the bad index implementation
     and reconfigure the garbage collector to reduce the
     garbage collection frequency by a factor of 1000

  *  "stat"s in the second commit phase.

     In our system, "stat"s for NFS served files could take up to
     27 s. It is a complete mystery why. Local IO, too, occasionally
     seemed to need excessive time. This, too, is still mysterious.
     We may have some hints: some ranking bugs in a search engine
     could cause millions of IO operations within a short timeframe
     and may have significantly affected the Linux IO behaviour.

  *  invalidation message reception and correspondng client cache updates
     during the second commit phase

Other causes for commit contention come from the (Zeo) server:

  *  "FileStorage.pack" unnecessarily holds the commit lock
     during large periods of the copying phase, drastically
     increasing the probability for commit contentions

  *  during some pack phase (reachability analysis),
     access to the storage file is high volume and erratic.
     This drastically reduces the performance of the storage
     and make commit contentions likely.

  *  other heavy use of the file system can affect the IO performance
     available for storage access and can increase the
     likelyhood for commit contentions.

>I can throw 10 more Apache/Zeo instances as it - but not sure if that's the
>right approach.

It is not. Commit contention is a synchronization problem.
It does not go away but is likely to increase when you scale
your frontends up.

>So I guess here's my questions.
>
>1. Is there a Zeo Client limit you can have when connecting to a Zope(Zeo
>Server) instance?

There is no limit in principle -- but as you can see,
lots of clients can affect performance.

Invalidation message processing poses a load on the server
which grows linearly with the number of clients (each client
must get all invalidations).

Most other Zeo load contributions are more dependent on the actual
number of requested operations (reads, writes, commits)
and less on the number of clients that request these operation
(of cause, more clients can generated more requests).

>2. Are there any special setting to allow for 'many' Zeo
>clients connecting to Zeo server?

Reconfigure the Python garbage collector such that it runs far
less often.

Get rid of components that (unnecessarily) create lots of Python objects.

Check whether you do unnecessary operations during the second commit
phase.

Place your ZODB storage files intelligently in the file system
such that other high volume IO operations do not badly affect
IO on the storage.

-- 
Dieter