[Zope] Zope Cacheability

Garth T Kidd gtk@well.com
Thu, 28 Oct 1999 10:09:38 +1000


I think I've figured out why www.zope.org is so slow, and it's got nothing
to do with the hardware or running the mailing list on the server itself.

Quite simply, www.zope.org is doing anywhere up to seven times as much work
as it needs to do.

If you want the world's proxy servers to co-operate with your site to reduce
the load on your servers, and want to be able to front-end your servers with
reverse proxy caches to *really* take the load off your servers, you need to
generate headers which tell them when they need to check back to see whether
your object has been modified and -- for when they check -- when the object
they're requesting was last modified.

I spotted www.zope.org's problem whilst using it as an example site to check
that a proxy I was installing was working properly. The log file was full of
misses, even upon reloads. Checking later with Mark Nottingham's
cacheability checker (written in Python) told the whole story:

http://www.ircache.net/cgi-bin/cacheability.py

The home page itself will always be stale. That's arguably okay because it's
so dynamic, but we could give ourselves pretty good dynamism and still avoid
a lot of needless hits if we whack an "expires in ten minutes" header item
in there. If we're waiting for a team to approve a news item, we can wait
another ten minutes for the page to expire out of everyone's caches. More
subtle thinking on this is below.

The following images referenced by the home page are completely
non-cacheable:

  Images/blue-rounder1.gif
  Images/blue-rounder2.gif
  Images/redhat.gif
  Images/apache.gif
  Images/python.gif
  Images/spacer.gif (hit more than once)
  SpotLightOn/garage

So, every time you hit www.zope.org, your browser probably makes a good
seven needless requests. That makes you wait, and needlessly loads down the
server. It's not just the main page, though. Every page you hit has at least
the first five of those. No wonder the server is crawling!

The only good ews is that Images/zpowered.jpg is cacheable. It was last
modified about ten weeks ago, according to the server. Proxies will have to
check back to find out whether or not it has been updated, but at least the
server can just tell them it hasn't instead of sending them the whole image
again, like the others. Adding an expiry tag to this would help, too.

Why zpowered.jpg is cacheable but none of the others are is beyond me.
Whatever's going on, though, we should fix it. There's no reason those guys
can't be sending an accurate Last-modified and a reasonable Expires (say, an
hour or a day in the future -- how often are you going to change 'em?).

---

It's also fun to hit http://sqiushdot.org/ with the cacheability checker.

We have to suffer a redirect to get to the actual content, which wastes a
hit. Neither the redirect or its destination are cacheable.

Most of Squishdot's images are cacheable -- they issue Last-Modified
headers -- but will be checked once per page view by any vigilant cache. The
non-vigilant ones will probably use the "if it's been around this long it'll
be around a quarter again as long" kludge. A good reason to whack an
explicit Expires header in is that you get to over-ride that. Result:
vigilant proxies lay off, and non-vigilant proxies pay more attention. What
more could you want?

Squishdot uses http://nedstat.tripod.com/ as an access counter. Sprung!

Ironically, the only image that is cacheable on www.zope.org -- the Powered
By Zope logo -- is uncacheable on Squishdot. D'oh!

---

Once I've got access to the guts of a public Zope server (should be less
than a week) I'm going to start drilling into Zope cacheability and figuring
out how easy it is to fix. I strongly suspect the answer is going to be "not
very hard at all", except perhaps for the mucking around deeper in Zope's
internals required to get Last-Modified answers without rendering an object
(see below).

The goal is simple: figure out what modifications can be made to Zope's
basic items so that they're at least reasonably cacheable. The best
reference for figuring out what headers we want to generate is probably
Mark's Caching Tutorial for Web Authors and Webmasters:

  http://www.mnot.net/cache_docs/

... and as well as patching Zope up, we can ask Mark to update his document
to add Zope to the list of potentially cache-friendly live content engines.
:)

As well as tackling the images, which is the easiest Big Win, we could hack
some code together to figure out when a page last mutated enough to be
considered "modified". For the www.zope.org front page, that's probably when
the last news item was approved or the last spotlight was added, whichever
was more recent.

This hack can't be more than a few lines of code, right? I'm thinking along
the lines of a last_modified DTML method which returns the time the object
was created unless an "always_stale" property is true. For bonus points, we
could use a dependency_list property to have last_modified check the objects
we depend on (the list of recent news items, for example) automatically.

As I mentioned above, Zope will still end up rendering the entire object if
we generate the header items using <dtml-call>. If we want an improvement
there, we need to dip deeper into Zope's guts. We'll need some Serious Zope
Talent to intervene, there. :)

Any volunteers?

Regards,
Garth.