[ZODB-Dev] RE: [Zope-CMF] Big CMF sites / Storage

Barry A. Warsaw barry@zope.com
Thu, 31 Jan 2002 11:03:01 -0500


>>>>> "TD" == Toby Dickenson
>>>>> <tdickenson@devmail.geminidataloggers.co.uk> writes:

    TD> FileStorage is 'damn fast'. Im currently using BerkeleyStorage
    TD> which is thought to be roughly 10x slower. However I have
    TD> never seen that make a difference to overall *system*
    TD> performance (even when looking carefully for that difference).

That's very interesting to know, and perhaps it makes some sense if
our hypothosis about why BerkeleyStorage is slow, is correct (there
are some doubts, see below).

I'm fairly well convinced that the write performance of
BerkeleyStorage is limited by BerkeleyDB's inability to handle big
blob writes well.  We can see in a normal migration test (i.e. moving
all transactions from a FileStorage to BDB) that the big pickles take
up a huge number of overflow pages, and it's my understanding that
overflow pages in BDB are just dog slow.  I'm somewhat confirmed in
this after talking to the Sleepycat folks.

However, I've done some testing that makes me uncertain about our
hypothosis that it's overflow page writing that bogs down BDB.  I've
tried things like truncate pickles to 2000 bytes, crank up page sizes,
etc. and I haven't seen much improvement even when the number of
overflow pages is reduced.  Overflow pages never go to zero, as they
do when I trunkcate pickles to 1 byte, so perhaps that's still the
problem, and I don't understand how key/values are packed onto pages.

We also haven't done any tests on a machine w/ multiple disks and
controllers, to separate out the log files and the data files.  The
Sleepycat docs do suggest this, and at one point we were trying to put
together such a machine for some testing, but other things took
precedence.

However, let's imagine that it's write performance that sucks because
of blobs, and read performance isn't as bad.  Throw in all the caching
that goes on in your average Zope/ZEO installation, and perhaps you
can eliminate most of the bottleneck at the backing storage layer.
That's encouraging for sure.

-Barry