[ZODB-Dev] Server-side caching

Laurence Rowe l at lrowe.co.uk
Mon Feb 13 12:19:27 UTC 2012


On 13 February 2012 10:06, Pedro Ferreira <jose.pedro.ferreira at cern.ch> wrote:
>> The OS' file-system cache acts as a storage server cache.  The storage
>> server does (essentially) no processing to data read from disk, so an
>> application-level cache would add nothing over the disk cache provided by
>> the storage server.
>
>
> I see, then I guess it would be good to have at least the same amount of RAM
> as the total size of the DB, no? From what I see in our server, the linux
> buffer cache takes around 13GB of the 16G available, while the rest is
> mostly taken by the ZEO process (1.7G). The database is 17GB on disk.

Adding enough memory so the database fits in RAM is always a good idea.

Since the introduction of blobs, this should be possible (and
relatively cheap) for most ZODB deployments. For Plone sites, a 30GB
pre-blobs Data.fs typically falls to 2-3GB with blobs.

There's also the wrapper storage zc.zlibstorage which compresses ZODB
records allowing more of the database to fit in RAM (RelStorage has an
option to compress records.)

>> Also note that, for better or worse, FileStorage uses an in-memory index
>> of current record positions, so no disk access is needed to find current
>> data.
>
>
> Yes, but pickles still have to be retrieved, right? I guess this would mean
> random access (for a database like ours, in which we have many small
> objects), which doesn't favor cache performance.
>
> I'm asking this because in the tests we've made wih SSDs we have seen a 20%
> decrease in reading time for non-client-cached objects. So, there seems to
> be some disk i/o going on.

The mean performance improvement doesn't tell the whole story here.
With most of you database in the file-system cache median read times
will be identical, but your 95th percentile read times will show a
huge decrease as the seek time on an SSD is orders of magnitude lower
than the seek time of a spinning disk.

Even when you have enough RAM so the OS can cache the database in
memory, I still think SSDs are worthwhile. Packing the database,
backing up or any operation that churns through the disk can all cause
the database to drop out of the file-system cache. Be sure to choose
an SSD with capacitor backup so it won't lose your data, see:
http://blog.2ndquadrant.com/en/2011/04/intel-ssd-now-off-the-sherr-sh.html.

>> In general, I'd say no.  It can depend on lots of details, including:
>>
>> - database size
>> - active set size
>> - network speed
>> - memory and disk speeds on clients and servers
>> - ...
>
>
> In any case, from what I see, these client caches cannot be shared between
> processes, which doesn't make them very useful , in which we have many
> parallel processes asking for the same objects over and over again.

You could try a ZEO fanout setup too, where you have a  ZEO server
running on each client machine. The intermediary ZEO's client cache
(you could put it on tmpfs if you have enough RAM) is then shared
between all the clients running on that machine.

Laurence


More information about the ZODB-Dev mailing list