[Zope3-dev] Florent's O-R blog entry

Tue Aug 23 11:29:20 EDT 2005

I recently read Florent's object/relational blog entry at http:// 
blogs.nuxeo.com/sections/blogs/florent_guillaume/ 
2005_08_11_object_relational .  It's getting a bit old now, but I  
didn't see much discussion (or a way to make a comment) so I thought  
I'd bring it up here to invite shared thoughts on his provocative  
ideas.  Florent spoke of both Zope 2 and Zope 3.  Because of my  
interests, my current job description, and my choice of mailing list  
for this discussion, I'll be speaking exclusively about the Zope 3  
side of things.  My O/R experience is on a smaller scale than  
Florent's (or Ape's) goals, so my responses are offered with  
knowledge that I may need to be corrected.

Florent suggests that a "proper enterprise-grade application server"  
using Zope should use an object-relational mapper such as Ape, and  
rely on it at its core.  He made a number of interesting observations  
about how this would allow us to discard the Zope catalog "hack",  
store blobs on the filesystem, and take advantage of RDBMS maturity  
for managing and analyzing content data and metadata.

While I agree with some of his observations, I believe that Florent's  
position--a blanket embrace of O-R underneath ZODB for all  
"enterprise" use cases--is overzealous.  Large business content  
management applications can have many different usage patterns and  
many different design characteristics and tradeoffs.  An O-R mapping  
is one choice that has advantages and disadvantages.

The most serious disadvantage to O/R mapping is that the cost of  
creating and maintaining the mapping is not trivial.  Requiring an O/ 
R mapping is a significant barrier of entry, unless you dump all of  
the data in something like Ape's 'extra stuff' store--in which case  
you've lost many of the compelling advantages of an RDBMS back end in  
the first place.  This cost could be somewhat alleviated with tools;  
however, to my knowledge, the tools do not yet exist.  Even with the  
tools, it would still be an extra layer of work demanded just to get  
things to work.

Also, while I won't confidently assert speed losses as a  
disadvantage, it's worth mentioning that mapping code may (will  
usually?) introduce more CPU churn (and slower app speeds) than  
FileStorage.

In any case, I know there are some cases in which O/R mappings would  
be very useful.  I do not agree that it is generically the right  
approach.  It has a cost.  Moreover, the advantages Florent listed  
are not as clear cut as he described.

Florent identified three advantages to O/R mapping: according to his  
blog, RDBMS indexing is clearly superior to the Zope catalog;  blobs  
are best handled with mapping code; and content data and metadata are  
clearly tabular and so fit within a relational database cleanly and  
obviously, providing advantages such as built in aggregating tools.   
He makes some good points, but I have caveats or disagreements with  
all three.

First, he identified the Zope catalog as a "hack" for which RDBMS  
indexes would be a cure.  I don't see how the Zope 3 catalog is a  
hack, nor do I necessarily see RDBMS indexes as inherently  
advantageous in all cases.

I agree that it is a problem that, given enough indexed objects and/ 
or enough indexes and/or a small object cache, loading the buckets  
when you traverse indexes can flush other objects from the ZODB  
cache.  If the flushed objects are expensive to load and frequently  
used, that can be a noticeable problem.  I believe this is a problem  
that can be addressed, or at least tuned for given applications.   
When it bites us enough that one of us in the community implements a  
smarter ZODB cache (or other solution) we'll all win.

It is also true, though you did not mention it, that the Zope 3  
catalog has no standardized query language or query optimizer.  The  
first job has some contenders, but the second one has no champions to  
my knowledge.

These are not reasons to discard BTrees, or indexes based on them.   
They provide some significant advantages.  Both common indexing  
requirements and new data structures, such as the fascinating RDFLib  
that Michel Pelletier has worked on, are handled well by the BTree  
code.  The BTree code is time-tested, relatively easy to use, and  
well maintained.  When combined with the transactional virtues of  
ZODB, the conflict resolution story reads very well, and very  
similarly to that of PostgreSQL (default behavior).

In terms of the actual indexes and catalog design, the Zope 3 text  
index is not as featureful as others, but the core algorithms are  
equivalent or even superior to many of them.  In addition, the  
interface system and the catalog design allows integration with other  
backends, such as the Lucene text index (as Stephan has illustrated,  
I believe).  It could even support an index with a RDBMS table back  
end, if desired.  This might get you some of the advantages you  
listed for the O/R back end at a lower cost of entry.

The catalog and index code is not a hack, and is in fact simple,  
effective and flexible.  Python is the query language, and the lack  
of an optimizer is not a reason to go running to an RDBMS index.  The  
catalog and index code could use polish and even alternate  
implementations, but the BTrees, the core code, are fantastic tools.

That said, certainly if your data and requirements suggested an RDBMS  
back end for other reasons, the advantages of robust and common RDBMS  
indexing are compelling.  My argument is simply that it is not a  
clear-cut win for an O/R mapping.

Another case Florent made for O/R mapping is blob support.  I can see  
this answering a number of the common use cases for blob support.   
However, solutions like Chris McDonough's Zope 2 blob product, to  
which you linked, seem like they could provide many of the same  
advantages, without  requiring a full O/R decision for your app.  I  
don't have enough information to weigh your opinion that the O/R  
solution would be simpler than Chris' kind of solution.  In any case,  
it does not necessarily seem like a clean win for the O/R argument.

The last advantage Florent mentioned for O/R mappings was that the  
tabular structure of RDBMS fit his data--presumably data that he felt  
was representative of the data needed by an "enterprise" CMS-- 
better.  Having moved from RDBMS systems to the ZODB, this surprised  
me.  In my experience, large businesses are very likely to have  
interconnected CMS data, one object pointing to another, in a way  
that is very well suited to object databases rather than relational  
databases.  Even in Florent's blog, the two examples of document  
hierarchy and (branched) versioning arguably match "classical" object  
database advantages better.

Of course, yes, RDBMS systems have many more years of maturity, and  
several have many more thousands of dollars spent on them than the  
ZODB.  It's reasonable to find them compelling, whether for their new  
XML features or for their old reliability, stability, and  
mathematical efficiencies.  But RDBMS designers continue to move to  
less table-oriented designs, trying to get many features we already  
have, whether they work through object integration or XML  
integration.  Some Zope applications will significantly benefit from  
an O/R mapping, but if you are a Python programmer, the ZODB alone,  
with the transparent FileStorage or DirectoryStorage back ends, is  
often a compelling, simpler, and reasonable alternative.

In conclusion, the nebulous concept of "enterprise" applications on  
Zope does not have a clear cut decision for or against an O/R mapper  
such as Ape.  The cost of O/R mappings is not inconsequential, and  
the advantages are not conclusive.  I hope that large projects that  
the Zope community works on together can support both, and do not  
depend on or exclude their use.  Florent makes some excellent  
observations, and solutions to the problems he identifies could be  
done at a number of layers in the code base.  Meanwhile, switching  
entirely to an O/R back end over FileStorage or DirectoryStorage  
feels like a significant case of "throwing the baby out with the bath  
water".

Gary