[Zope] Scalability bombing missions (Was: Re: [Zope] ArsDigita .. request for comments)

Patrick Phalen zope@teleo.net
Sat, 4 Mar 2000 09:28:27 -0800


[Jimmie Houchin, on Sat, 04 Mar 2000]

:: As a prodigal son here, I embraced Bobo/Zope early, became concerned
:: about scalability, wandered the websphere, and have come back home. I
:: have looked at Java/Servlets/Enhydra, AOLserver/ACS, etc. I decided to
:: follow my heart. I like Python and Zope. I like the Python and Zope
:: communities. This is where I want to make a difference and contribute.
<snip>
:: If I believed throwing $$$dollars to solve scalability was required. I
:: would much rather throw $$$dollars at Digital Creations. Instead of a
:: $20,000 to $500,000 on up license of Oracle which is per machine, per
:: website, and potentially expires. Instead of requiring buying big, big
:: hardware for $$$dollars. Throw the money towards Digital Creations or
:: other Zope or Python developers to solve the issues which hold
:: Zope/Python back. In the end you will have helped build a bigger, better
:: open source software solution.


Welcome back, Jimmie. I enjoyed reading your analysis and agree with
the thrust of it.

One thing you express, and which seems to have ossified into the current
received wisdom, is that scalability will be improved  when some
mythical deep pockets consulting client pays to get it improved ; then
everyone else will benefit. The problem is with the leap of faith it
requires from the client.

I currently have several clients who are very interested in Zope.
Their web sites already receive megahits. They are willing to pay money
for consulting, but certainly won't bank their businesses on the chance
that a magic key to scalability is found. As a consequence, the
greatest commitment they will make is to install Zope on a Linux server
on a VPN and "see how it goes" in a toy setting.

Therefore, I can testify that right now, today, Zope is being hurt by
the lack of a story on scalability.

If economics or policy dictates that Zope requires an "Angel" to
bankroll scalability research, we all have to be prepared for a
scenario in which that Angel never appears. Personally, I don't like
taking such a passive approach.

Two clear impediments to Zope's wide acceptance have been frequently
discussed on this list -- documentation and performance.

The documentation "story" has travelled on an arc which might be
expressed like this:

* Zope is Open Source. The community should document.
* The community can't take on the deeper Zen issues
* Volunteers from the community are time constrained
* DC hires a technical writer
* The ZDP is formed
* ZDP suffers from the lack of time available to its volunteers
* DC commits Amos to attack the general documentation problem and to
evangelize documentation

Documentation currently seems headed on a good track. What have we
learned from this trajectory? We've all learned that the Zope experiment
is different from other Open Source projects. It *needs* the cooperation
of *both* the Zope community and Zope's authors "shooting at the same
basket" to solve this.

I believe scalability requires a similar coordinated attack and that it
can't wait to be bankrolled.

ZEO, although a bit of a black box to us at the moment, sounds, on
paper, like a logical and credible attack on one aspect of scalability
-- the "throw hardware at it" approach.

No doubt there are other hidden opportunities. What kinds of hidden
opportunities? Who knows; they're hidden. ;>) But can we bring them out
of hiding?

Some of them are simple applications of available compute science, yet
I don't see these codified or set down anywhere. Perhaps we should have
a single HOW-TO repository for techniques and metrics people have
discovered, or just happen to know, in the performance arena. This
would be a FAQ along the lines of "I want to build my Zope site for
best performance. What things should I do? What things should I avoid?"
God knows this question comes up often enough.

Then there's the design and makeup of the Zope software itself. Its
original design requirement probably didn't put scalability at the top
of the stack. Here, there are probably other areas where the user
community can't know enough about inner workings and overall design to
contribute effectively -- but which require attention and a commitment
from the folks at DC. I'll mention a couple of things that pop into my
head. These are simply conjectural -- dialog starters. I'll be happy to
be shown that I'm dead wrong about any of these conjectures.

Zope began as a brilliant notion, of Jim Fulton's devising. It began as
a fews lines of code and has evolved to include better than two
megabytes of code. DC has evolved from a company with one employee to a
company with maybe a dozen engineers contributing to the code
base and is looking to hire twenty more engineers.

Among this team of top notch talent, who takes ultimate responsibility
for evangelizing and improving performance? Who leads the effort at
code review, with an eye to marshalling refactoring efforts?

What would be an example? OK, Devil's Advocate: Here's one area of
concern I have, perhaps unfounded. Digital Creations utilizes Use Cases
as a tool for design, in the Ivor Jacobson tradition. Use Cases have
many proponents and some detractors. One detractor is Bertrand Meyer,
author of the classic _Object-Oriented Software Construction_. His beef
with the use of Use Cases is that it poses a fundamental risk: it
encourages a functional approach, based on processes (actions). He sees
this as the reverse of O-O decomposition, which focuses on data
abstraction. He believes it carries a risk of reverting, under the
heading of object-oriented development, to the traditional forms of
functional design, by considering what the system does, rather than
what it does it to. He believes that, due to this risk, Use Cases
should be confined to use as a validation tool, not a design tool, to
avoid closing interfaces prematurely.

So let's say that Use Cases is a useful tool, but one that has a subtle
built-in gotcha. What are the dangers? Naturally, one is negative
impact on code maintainability/reusability.

Another may arise due to the need to balance two competing technical
concerns: the desire to encapsulate abstractions, and the need to make
certain abstractions visible to other modules. With regard to the
dynamics of subprogram calls, the placement of declarations within
modules can greatly affect the locality of reference and thus the
paging behavior of a virtual memory system. Poor locality happens when
subprogram calls occur across segments and lead to cache misses and
page thrashing, which can ultimately slow down the whole system.

Is Zope prey to this? I don't have a clue. Does anyone? Like I say, I'd
be happy to be shown that this concern is unfounded. But this is the
sort of creeping problem which can enter into a growing code base like
Zope's, as it is worked on by numerous people, each devoted on one
module or interface. Combatting such problems probably requires having
someone (someone blessed with both Jim Fulton's dedication to O-O
principles and an unbending zealotry for absolute performance)
committed to periodically flying above the code at a sufficient
distance to see such patterns and making bombing runs against the
target problems. As busy as everyone at DC is, I'm not certain that
anyone is doing that at the moment.