[Zope3-dev] Florent's O-R blog entry
Gary Poster
gary at zope.com
Tue Aug 23 11:29:20 EDT 2005
I recently read Florent's object/relational blog entry at http://
blogs.nuxeo.com/sections/blogs/florent_guillaume/
2005_08_11_object_relational . It's getting a bit old now, but I
didn't see much discussion (or a way to make a comment) so I thought
I'd bring it up here to invite shared thoughts on his provocative
ideas. Florent spoke of both Zope 2 and Zope 3. Because of my
interests, my current job description, and my choice of mailing list
for this discussion, I'll be speaking exclusively about the Zope 3
side of things. My O/R experience is on a smaller scale than
Florent's (or Ape's) goals, so my responses are offered with
knowledge that I may need to be corrected.
Florent suggests that a "proper enterprise-grade application server"
using Zope should use an object-relational mapper such as Ape, and
rely on it at its core. He made a number of interesting observations
about how this would allow us to discard the Zope catalog "hack",
store blobs on the filesystem, and take advantage of RDBMS maturity
for managing and analyzing content data and metadata.
While I agree with some of his observations, I believe that Florent's
position--a blanket embrace of O-R underneath ZODB for all
"enterprise" use cases--is overzealous. Large business content
management applications can have many different usage patterns and
many different design characteristics and tradeoffs. An O-R mapping
is one choice that has advantages and disadvantages.
The most serious disadvantage to O/R mapping is that the cost of
creating and maintaining the mapping is not trivial. Requiring an O/
R mapping is a significant barrier of entry, unless you dump all of
the data in something like Ape's 'extra stuff' store--in which case
you've lost many of the compelling advantages of an RDBMS back end in
the first place. This cost could be somewhat alleviated with tools;
however, to my knowledge, the tools do not yet exist. Even with the
tools, it would still be an extra layer of work demanded just to get
things to work.
Also, while I won't confidently assert speed losses as a
disadvantage, it's worth mentioning that mapping code may (will
usually?) introduce more CPU churn (and slower app speeds) than
FileStorage.
In any case, I know there are some cases in which O/R mappings would
be very useful. I do not agree that it is generically the right
approach. It has a cost. Moreover, the advantages Florent listed
are not as clear cut as he described.
Florent identified three advantages to O/R mapping: according to his
blog, RDBMS indexing is clearly superior to the Zope catalog; blobs
are best handled with mapping code; and content data and metadata are
clearly tabular and so fit within a relational database cleanly and
obviously, providing advantages such as built in aggregating tools.
He makes some good points, but I have caveats or disagreements with
all three.
First, he identified the Zope catalog as a "hack" for which RDBMS
indexes would be a cure. I don't see how the Zope 3 catalog is a
hack, nor do I necessarily see RDBMS indexes as inherently
advantageous in all cases.
I agree that it is a problem that, given enough indexed objects and/
or enough indexes and/or a small object cache, loading the buckets
when you traverse indexes can flush other objects from the ZODB
cache. If the flushed objects are expensive to load and frequently
used, that can be a noticeable problem. I believe this is a problem
that can be addressed, or at least tuned for given applications.
When it bites us enough that one of us in the community implements a
smarter ZODB cache (or other solution) we'll all win.
It is also true, though you did not mention it, that the Zope 3
catalog has no standardized query language or query optimizer. The
first job has some contenders, but the second one has no champions to
my knowledge.
These are not reasons to discard BTrees, or indexes based on them.
They provide some significant advantages. Both common indexing
requirements and new data structures, such as the fascinating RDFLib
that Michel Pelletier has worked on, are handled well by the BTree
code. The BTree code is time-tested, relatively easy to use, and
well maintained. When combined with the transactional virtues of
ZODB, the conflict resolution story reads very well, and very
similarly to that of PostgreSQL (default behavior).
In terms of the actual indexes and catalog design, the Zope 3 text
index is not as featureful as others, but the core algorithms are
equivalent or even superior to many of them. In addition, the
interface system and the catalog design allows integration with other
backends, such as the Lucene text index (as Stephan has illustrated,
I believe). It could even support an index with a RDBMS table back
end, if desired. This might get you some of the advantages you
listed for the O/R back end at a lower cost of entry.
The catalog and index code is not a hack, and is in fact simple,
effective and flexible. Python is the query language, and the lack
of an optimizer is not a reason to go running to an RDBMS index. The
catalog and index code could use polish and even alternate
implementations, but the BTrees, the core code, are fantastic tools.
That said, certainly if your data and requirements suggested an RDBMS
back end for other reasons, the advantages of robust and common RDBMS
indexing are compelling. My argument is simply that it is not a
clear-cut win for an O/R mapping.
Another case Florent made for O/R mapping is blob support. I can see
this answering a number of the common use cases for blob support.
However, solutions like Chris McDonough's Zope 2 blob product, to
which you linked, seem like they could provide many of the same
advantages, without requiring a full O/R decision for your app. I
don't have enough information to weigh your opinion that the O/R
solution would be simpler than Chris' kind of solution. In any case,
it does not necessarily seem like a clean win for the O/R argument.
The last advantage Florent mentioned for O/R mappings was that the
tabular structure of RDBMS fit his data--presumably data that he felt
was representative of the data needed by an "enterprise" CMS--
better. Having moved from RDBMS systems to the ZODB, this surprised
me. In my experience, large businesses are very likely to have
interconnected CMS data, one object pointing to another, in a way
that is very well suited to object databases rather than relational
databases. Even in Florent's blog, the two examples of document
hierarchy and (branched) versioning arguably match "classical" object
database advantages better.
Of course, yes, RDBMS systems have many more years of maturity, and
several have many more thousands of dollars spent on them than the
ZODB. It's reasonable to find them compelling, whether for their new
XML features or for their old reliability, stability, and
mathematical efficiencies. But RDBMS designers continue to move to
less table-oriented designs, trying to get many features we already
have, whether they work through object integration or XML
integration. Some Zope applications will significantly benefit from
an O/R mapping, but if you are a Python programmer, the ZODB alone,
with the transparent FileStorage or DirectoryStorage back ends, is
often a compelling, simpler, and reasonable alternative.
In conclusion, the nebulous concept of "enterprise" applications on
Zope does not have a clear cut decision for or against an O/R mapper
such as Ape. The cost of O/R mappings is not inconsequential, and
the advantages are not conclusive. I hope that large projects that
the Zope community works on together can support both, and do not
depend on or exclude their use. Florent makes some excellent
observations, and solutions to the problems he identifies could be
done at a number of layers in the code base. Meanwhile, switching
entirely to an O/R back end over FileStorage or DirectoryStorage
feels like a significant case of "throwing the baby out with the bath
water".
Gary
More information about the Zope3-dev
mailing list