[ZODB-Dev] RE: [Zope-CMF] Big CMF sites / Storage

sean.upton@uniontrib.com sean.upton@uniontrib.com
Thu, 31 Jan 2002 09:48:32 -0800


Thanks for the suggestions; a few clarifications on what I am thinking...

1 - All connections are 100bT Full Duplex.  Front end HTTP traffic (between
Squid and ZEO clients) is on one VLAN/network; ZEO client to ZSS connections
are on another; the only other thing running on the same network as the
ZC->ZSS is UDP messages for heartbeat for clustering/failover for the
primary and hot-backup ZSS - I don't think that this is a huge bandwidth
eater, (small UDP messages every 5 seconds).  I do get concerned about
latency, but figure that the latency improvement of GB Ethernet would not
justify its use.

2 - I hadn't considered latency problems in revalidation, though I assume
that there is at least some benefit to ZEO client caching on a heavy traffic
site's objects, even when using a network connection to the ZSS?

3 - I'm paranoid about page-load performance, so I'm planning putting most
of my efforts into a fast front-end infrastructure (via Squid and your ICP
Zope patches) to mitigate pressure on the back end.  That said, i'm also
concerned about intensive operations on the backend of my site that a
downstream cache won't take care of.  It is likely that anything involving a
large write at the end of a transaction like a Catalog reindexing will be
done via a ZEO client running on the ZSS box using Unix sockets instead of
one of the LAN-connected clients.

Sean

-----Original Message-----
From: Toby Dickenson [mailto:tdickenson@devmail.geminidataloggers.co.uk]
Sent: Thursday, January 31, 2002 4:23 AM
To: sean.upton@uniontrib.com
Cc: chrisw@nipltd.com; zodb-dev@zope.org
Subject: Re: [ZODB-Dev] RE: [Zope-CMF] Big CMF sites / Storage


On Wed, 30 Jan 2002 17:10:42 -0800, sean.upton@uniontrib.com wrote:

>I've posted this to ZODB-dev for further discussion about scaling big
>ZODBs...

I benchmarked some storage options earlier this year. results at
http://www.zope.org/Members/htrd/benchmarks/storages

> a bit of background on my most current project: a CMF site with
>potential for 1million+ objects.
>
>Regarding hardware... I'm trying to forecast what to buy, and this is what
>I'm guessing at the moment...  All boxes are likely to be Dual Athlon MP
>boxes (1.2 & 1.56 GHz), with likely to be 1GB on Zope clients and 3GB on
the
>ZSS box; the ZSS will be running a RAID10 of 4 10kRPM drives (via Mylex 170
>16MB cache).

I suspect it is unlikely that your storage server would make good use
of 3GB. At the moment all storages need *alot* of RAM for packing,
this is probably the only time you would need more than 128MB if using
BerkeleyStorage.

(Later this year my BerkeleyStorage will be hitting a RAM-pack
ceiling; so this is unlikely to go unfixed)

>  The ZEO client cache will be run on a software RAID0 of two
>volumes on 7200RPM IDE disks...

How much bandwidth do you have between ZEO servers and clients? If
they are on a LAN then a large ZEO client cache will have more
problems, and few advantages. The cache has to be revalidated at the
start of each connection.

>If I understand correctly, FileStorage might make more sense from a
>performance perspective

FileStorage is 'damn fast'. Im currently using BerkeleyStorage which
is thought to be roughly 10x slower. However I have never seen that
make a difference to overall *system* performance (even when looking
carefully for that difference).


The only thing you didnt mention was separating front-end http traffic
from back-end ZEO traffic onto different NICs. ZEO suffers badly if
network latencies become high because of the large number of
round-trips. This may be worth considering if your http traffic is
high enough.



Toby Dickenson
tdickenson@geminidataloggers.com