[ZODB-Dev] Server-side caching

Mon Feb 13 14:49:20 UTC 2012

On Mon, Feb 13, 2012 at 5:06 AM, Pedro Ferreira
<jose.pedro.ferreira at cern.ch> wrote:
> Dear Jim,
>
> Thanks for your answer.
>
>
>> The OS' file-system cache acts as a storage server cache.  The storage
>> server does (essentially) no processing to data read from disk, so an
>> application-level cache would add nothing over the disk cache provided by
>> the storage server.
>
>
> I see, then I guess it would be good to have at least the same amount of RAM
> as the total size of the DB, no? From what I see in our server, the linux
> buffer cache takes around 13GB of the 16G available, while the rest is
> mostly taken by the ZEO process (1.7G). The database is 17GB on disk.

Having enough ram to hold your entire database may not be practical.
Ideally, you want enough to hold the working set.  For many applications,
most of the database reads are from the later part of the file.  The working
set is often much smaller than the whole file.

>
>
>> Also note that, for better or worse, FileStorage uses an in-memory index
>> of current record positions, so no disk access is needed to find current
>> data.
>
>
> Yes, but pickles still have to be retrieved, right?

Yes, but this is better than having to do disk accesses to get the meta
data needed to find the records.

> I guess this would mean
> random access (for a database like ours, in which we have many small
> objects), which doesn't favor cache performance.

I don't see how this follows.

...

>> In general, I'd say no.  It can depend on lots of details, including:
>>
>> - database size
>> - active set size
>> - network speed
>> - memory and disk speeds on clients and servers
>> - ...
>
>
> In any case, from what I see, these client caches cannot be shared between
> processes, which doesn't make them very useful , in which we have many
> parallel processes asking for the same objects over and over again.

The caches are still probably providing benefit, depending on how large they
are.  If you haven't, you should probably try using the ZEO cache-analysis
scripts to get a better handle on how effective our cache is and whether it
should be larger.

It's true that storing the same data in many caches is inefficient.

I imagine that someone will eventually figure out how to use
memcached to implement a shared ZEO cache, as has been done
for relstorage.

At PyCon, I'll be presenting work I've been doing on a load
balancer that seeks to avoid sharing the same data in multiple
caches by assigning different kinds of work to different workers.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton