[ZODB-Dev] zeo client persistent cache files?

Sun Oct 31 20:15:20 EST 2004

On Sun, 2004-10-31 at 00:57, Tim Peters wrote:
> > I had always assumed the intent of creating a persistent client cache
> > file was that if you shut down the program which acts as the ZEO client
> > and restarted it, that "less communication" would need to happen on the
> > next run of the client program between the client and the server.

...

> > But in some cursory tests, I find that this appears to not be the case.
> 
> Sorry, it's impossible to tell from here.  Missing info:
> 

Right, of course.

> + How big is your ZEO client cache (measured in bytes).

1 GB.

> + How big is your ZODB pickle cache (measured in # of objects).

5000 objects

> + How big are your ~700 objects in aggregate (measured in bytes).

64K apiece * 700 = ~ 42MB.

> There are always two caches in play when using ZEO.  A Connection object has
> "a pickle cache", which is an in-memory cache.  By default, it holds 400
> objects across transaction boundaries, regardless of how much RAM they
> consume.  If you're loading the same 700 objects repeatedly across
> transactions, then about 400 of them are gotten from this memory cache.
> Note that the pickle cache persists even across closing and then opening a
> "new" Connection (Connections are actually cached in a pool when they're
> closed, by the DB object, and reused by subsequent DB.open() calls).

Got it.

> The ZEO client cache is a second-level cache, seeing only requests not
> satisfied by the pickle cache.  It's a disk-based cache, and defaults to a
> maximum size of 20MB, only half of which is fully usable at any time.  The
> default size is absurdly small given modern disk capacities.  Since you
> didn't say how big your objects are, I can't guess how much disk cache might
> be appropriate.

I've had it set to a gig for good measure.

> > Then I shut down the client program and immediately restart it.  No other
> > processes are accessing the ZEO server, so no objects could have changed
> > on the server.  My first request for that same set of objects is gives me
> > an aggregate transfer rate of 5MB/s and the ZEO log file shows lots of
> > zeoLoad calls which appear to have the same general makeup as the kinds
> > of requests that would have come in if the data was not in in the client
> > cache.
> >
> > I would have expected the actual transfer rate to be somewhat higher;
> > maybe not the "70MB/s" optimum that appears to be the limit of
> > cache-to-app transfer, but somewhere between that and the "worst case"
> > 5MB/s of needing to load every object from the server again.
> 
> Your first scenario probably satisfied most requests out of memory (the
> pickle cache), not from the ZEO disk cache.

Good point.

>   I can guess that because the
> default pickle cache holds 400 objects, and you said you have 700.  There's
> not enough info to guess what to expect from the ZEO cache for the other 300
> objects.

In this case, there should be no accesses coming from the disk cache.

> I don't know.  Try boosting the size of your ZEO cache and see whether that
> helps?  Hell, give it a gigabyte -- that's a great use for disk space, if
> you have some to spare.  It's even possible (but I think unlikely) that, if
> you're running on a fast local network, it's not any faster to get an object
> out of the ZEO client cache than to refetch it over the network.  Either way
> ends up reading the object pickle from disk, and also ends up needing to
> unpickle it, so the difference is largely network overheads.  The pickle
> cache avoids all of disk, unpickling, and network overheads, so to the
> extent the pickle cache came into play in your first scenario, it greatly
> accelerated object reuse.

I am indeed using a fast local network.  But as I said before I think
that the ZEO persistent cache isn't being consulted due to log output on
the ZEO server when I send a request which I believe should be serviced
by the disk cache (the pattern of log messages is indisinguishable from
the log messages resulting from real load requests).  I don't know the
ins and outs of how the cache does its invalidation though, so perhaps
this is an incorrect assumption.

> It's also possible there's a new bug in ZODB 3.2.4, and it's also possible
> there's an old bug in ZODB 3.2.4, and it's also possible there's a new or
> old bug in Zope 2.7.3.  I don't believe anything relevant changed in ZODB in
> 3.2.4, except that the number of bytes ZEO needs to send over the network is
> potentially much smaller (up to a factor of 4) under 3.2.4 compared to 3.2.3
> or earlier.

Yup.

> A peculiarity with pragmatic importance:  because the ZEO cache is a
> second-level cache, it's predictable that it *doesn't* hang on to the most
> popular objects:  since all but the first request for a very popular object
> is satisfied by the pickle cache, such an object looks *unpopular* to the
> ZEO cache (which sees only the first request for it).  So if you've got a
> highly skewed access pattern, with relatively few very popular objects, it's
> predictable that a persistent ZEO cache won't do you much good.  But I don't
> think that applies to the scenario you're sketching here.

The access pattern is totally artificial but the only thing that should
be in the cache in this artificial case is the ~700 objects that I'm
requesting.

I will do some less ad-hoc testing and see what happens.

Thanks!

- C