[Zope] One big Zope or many small Zopes? Or perhaps a ZEO?

sean.upton@uniontrib.com sean.upton@uniontrib.com
Mon, 22 Oct 2001 15:01:56 -0700


I would definitely NOT suggest one single Zope if it doesn't fit your use
cases, and you would prefer to have multiple object databases.  However,
keep in mind that a single ZEO storage server process can serve up many
storages (though this is not super ideal on a multi-CPU system, see
below)...

Another, equally valid way of segmenting things, is to externally mount ODBs
to your top-level folders, so that realestate resides in a different folder
(and ODB) than your main site, and use Virtual Hosts and your front end to
serve several virtual Zope hosts using their own odb, but using one Zope;
the disadvantage to this is it won't scale as well on a dual-CPU ZEO client
node...

Keep in mind that one way of planning is to accept that Zope Servers and the
ZSS cannot really scale past one CPU... So you may benefit from several
Zopes on one box, accessing a backend ZEO storage Server (or multiple
storage servers, if you have several CPUs and fast storage, and already
segment things).  In this case, a bit of segmentation might actually buy you
some performance.

Sean

-----Original Message-----
From: Bill Blevins [mailto:bblevins@fredericksburg.com]
Sent: Saturday, October 20, 2001 4:46 AM
To: sean.upton@uniontrib.com; kirk@strauser.com; zope@zope.org
Cc: Christopher Muldrow
Subject: Re: [Zope] One big Zope or many small Zopes? Or perhaps a ZEO?


Thanks Sean (and Kirk).

So, ya'll are saying we should have ONE zope managed over several servers
with ZEO (and caching)? Key question is the ONE zope.

Currently, our huge zope is for the newspaper content, then homes is on it's
own install, classifieds another and business directory another and two
testing zopes on two others. (All on one box with our database pulled from
another box.)

Am I understanding that correctly?

One other quick question. With 75,000 page
 views per day, set up the way we
are now, (one box/multiple zopes) how many connections should we be
allowing? I think the default it 4 and we have opened it up to 8. Sound
right?

--
Bill Blevins
fredericksburg.com

###


On 10/19/01 3:14 PM, "sean.upton@uniontrib.com" <sean.upton@uniontrib.com>
wrote:

> We use ZEO + Squid, with plenty of fast hardware and segmented VLANs for
> performance and security.  We proxy directly from Squid to ZServer, use
> squid for load-balancing, and use a squid redirector for virtual host
> support.  Squid does indeed help out a ton...  Caching proxies have the
nice
> benefit of allowing you to keep images in Zope, but have the speed of
> serving them from in-memory caches (for in-transit and frequently-used
> images).  We currently are running up to 150,000 page views / day through
> this (classifieds site), and eventually will have up to 1 million+ page
> views (well, on peak days) per day as we move more portions of our site
onto
> this setup away from primarily static publishing, and we feel pretty
> confident with the combo, especially with what you might call
"semi-dynamic"
> content, like published items that get requested many times, which works
> well for newspaper web site content, like classifieds ads, editorial, and
> vertical advertising content like MLS listings.
> 
> We are likely to move a lot of stuff onto Zope gradually over the next
year,
> and we plan on having a lot of traffic, and a need to deal with it.
Here's
> a simplifi
ed breakdown of what we are doing:
> 
>    3 cluster tiers, between each a different VLAN
>    ==============================================
> 
>                                 Caching Proxy
>                             =====================
>      [cache1]::::[cache2]
>           | \                ZEO Client / Apache
>           v  \              =====================
>      [node1]<-x->[node2]
>           |   ___/          ZSS/NFS/MySQL Cluster
>           v  /              =====================
>      [storage]::::[hot_backup_storage]
> 
> For reliability, each node in each of these three tiers uses clustering
> software (Linux-HA/heartbeat, which is really simple, free, and just does
IP
> takeover when it doesn't see the heartbeat of its peer). We are likely to
> add ZEO client nodes as our traffic and application needs grow.
> 
> If you wanted to get a lot of the same benefits with much less hardware,
you
> could set up a 2-box arrangement:
> 
> 1 - Get a fast dual-CPU box, with internal hardware RAID, and 1GB+ RAM,
> running ZEO/ZSS and 2 ZEO client processes, as well as your relational
> database software.  You use ZEO so your Zope processes can take advantage
of
> multiple processors.
> 
> 2 - Get another similar box, and run Squid on it, with a bunch of
redirector
> processes (which are CPU intensive, justifying the investment in a dual
CPU
> box).  In order to load-balance the 2 Zopes running on different TCP ports
> on server 1, you would have to use multiple IP add
resses on your interface
> on server 1 (above).
> 
> This wouldn't reduce single-points of failure, but would be quite nice
from
> a performance standpoint, and really doesn't involve a substantial
hardware
> investment. This would look like:
> 
> [squid]  Running Squid, load-balancing 2 ZEO client
> | |                   processes
> v v
> [node]  For example, Serving Zope at 10.1.1.1:8080
>                     and 10.1.1.2:8080
>   This would require you to bind Z2.py to an interface
>        This box would also run your Relational Databases, and the ZSS.
> 
> Servers we are looking into for next year that fit this kind of
description
> are, for example, Appro 1124, a 1 rack unit box, with Dual 1.5GHz Athlon
> CPUs, which is a nice box designed around the Tyan Thunder K7 mainboard,
and
> this is likely the fasted dual-CPU x86 box you can buy, and it is
reasonably
> priced, given it was built with somewhat commodity standard components
> (though it looks to handle heat well, from reviews I have read of this).
> 
> Anyway, this is just my take on the best way to address this; others may
> feel differently, but I feel this is an excellent strategy for an online
> newspaper site, or another site that "publishes" content accessed in
similar
> ways by many users.
> 
> Sean
> 
> =========================
> Sean Upton
> Senior Programmer/Analyst
> SignOnSanDiego.com
> The San Diego Union-Tribune
> 619.718.5241
> sean.upton@uniontrib.com
> =========================
> 
> 
> -----Or
iginal Message-----
> From: Kirk Strauser [mailto:kirk@strauser.com]
> Sent: Friday, October 19, 2001 9:44 AM
> To: zope@zope.org
> Subject: Re: [Zope] One big Zope or many small Zopes? Or perhaps a ZEO?
> 
> 
> 
> At 2001-10-19T16:22:27Z, Chris Muldrow <muldrow@mac.com> writes:
> 
>> Also, we're running at traffic somewhere around 60,000-80,000 page views
a
>> day--not huge traffic, but more than we had a year ago, certainly. At
what
>> traffic point have most folks noticed a need for more server power?  Is
it
>> 100,000 page views? More? Less?
>> 
>> We are also serving ads to the Zopes from a different Windows 2000 server
>> running Apache and using a PERL process to serve the ads at a rate of
>> between 400,000 and 700,000 ads per day.
> 
> Note: I'm a Zope newbie, as anyone reading my last week's worth of
postings
> can tell, but I'm not completely inexperienced at network design.
> 
> I would strongly recommend the use of a proxy/cache in front of your
> servers.  It sounds as if much of your content is pseudo-static.  That is,
> although it may change, it's likely to do so slowly.  Caching servers can
> make a vast difference in performance in setups like this.  For example,
> suppose that users often go to:
> 
> http://mynewspaper.com/sports/todays_headlines
> 
> Why force your Zope to regenerate that page 30,000 times per day when it
may
> only change 3 or 4 times?  Zope even has built-in methods for cache
> management so that you can have it send special headers to the 
cache servers
> to tell them how often to re-query specific objects.  You may not want to
> cache a stock ticker at all.  OTOH, the current temperature won't
> drastically change in any given 5 minute interval.
> 
> I haven't personally used these methods yet (see the first line of my
post),
> but I *can* certify that a properly-configured Squid server can increase
> your current platform's potential throughput by several hundred percent.