[Zope] ZEO and a front end...

Toby Dickenson tdickenson@geminidataloggers.com
Tue, 18 Jul 2000 14:32:51 +0100


On Tue, 18 Jul 2000 04:22:16 -0600, Bill Anderson <bill@libc.org>
wrote:

>> I think most people seem to be missing the point here.
>> 
>> The idea is that ALL servers can serve ALL content.  HOWEVER, the 'load
>> balancer' will opt for a certain server for a certain URL, in order to
>> improve cache hits.
>> 
>> So, for www.contrived-example.com/dir1  it will first try server1, but if
>> it's busy (or down) it will try others.  This way, the cache on server1 is
>> more likely to contain objects relevant to /dir1  and thus have a higher hit
>> rate, therefore improving performance.
>
>No, I understand what is being discussed, I doubt the problem. :-)

You are right, theres no problem in the scenario you described. 

Ill fill in some more details about the fictional example for which I
still can't see an easy solution....

Zope is used to store books. Each book object contains:
1. The text of the books, each page in a separate object
2. Images and diagrams for the book.
3. A ZCatalog full-text-index of the book.
Each book object allows:
1. Searching, viewing pages, etc.
2. Dynamically rendering a range of pages as pdf, postscript, etc.

The whole database stores 10,000 books, and is served by a cluster of
many identical Zope servers.

A typical usage pattern might be:
a. Users searches through a book to find the interesting pages
b. He browses the pdf version of those pages
c. He tweaks the page range, and double-checks the pdf version.
d. then downloads a postscript version of that page range for printing

Assume that noone has accessed this book recently, so it's not in any
caches.

The cache has to be filled at step b. This transfers alot of data -
possibly the whole content of the book - and introduces a noticeable
delay.

The possibility for optimisation comes at steps c and d. There is one
cache already filled with the right data - if the requests from c and
d can be directed to the same server as the original then the
cache-filling delay can be avoided.

This extra delay might not have a great impact of actual site
performance, but I've found a catastrophic affect on perceived
performance in some usability tests. Users seem happy to accept a
delay when they first access their data, but not if it repeated in a
subsequent request.

Bill wrote...

> http://my.site.com/sec1 is mapped to: sec1.site.com, which
> is load balanced across as many machines as possible

I might be reading more into his words than was intended, but I think
this demonstrates the problem. Distributing multiple requests for one
section across multiple servers is (what I consider to be)
undesirable.

I want to move load balancing up one level of abstraction -
distributing sections across machines (rather than connections).

>If that isn't enough, you can throw eddieware into the mix, which
>*already* has the ability to redirect based upon the URL.

Ive not seen eddieware before - so it looks like Ive got some reading
to do.

At a first glance it doesn't have any integrated http caching
(although it seems to have everything else ;-) and theres no obvious
place to hang squid. In my example above, I really want to be able to
cache the rendered pdf files.



Toby Dickenson
tdickenson@geminidataloggers.com