[Zope3-dev] Re: April Zope3 sprint in Louvain-la-Neuve Belgium

Martijn Faassen faassen@vet.uu.nl
Mon, 24 Feb 2003 18:58:17 +0100


Godefroid Chapelle wrote:
> Paul Everitt and I would like to propose to also work on XML integration 
> in Z3 framework (this means at least looking how to integrate XSLT and 
> XMLSchemas).
> We are willing to be sparkles on this project so that interested people 
> could take the lead as we are already working on Z3MI.

Okay, this certainly increased the chances a lot I'm going to show up;
Infrae has a lot of expertise and a lot of interest in XML and Zope;
this sort of makes it required I come and give some input all of a sudden.
Cutting it tight though as I'll be just barely back from the US and
PyCon.. Need to discuss things here.

I have years of experience working with XML in Zope... First
XMLDocument, then ParsedXML. I maintain ParsedXML and know where it
sucks. :) Silva uses XML all over the place and we're moving into
using XPath. We don't have a lot of expertise on XSLT though.

Infrae is also working on Forest, an XML database. I hope it will 
eventually be plugged into Zope 3. What Forest will be able to do
with wild estimates of completeness of implementation:

  Store XML in ZODB efficiently using BTree: 80% (can't pickle some C-based
  structures yet and no performance tuning yet -- would also like to have
  IIBTrees which can work with long long (64 bit) ints..)
  Read-only DOM: 95% (can already use PyXML XPath with this)
  Read in documents using SAX: 85% (processing instructions and comment
  nodes are still missing, but namespaces work)
  Output documents using SAX: 85% (same story)
  Output documents as XML: 95% 
  XML:DB API compliant: 30%
  Highly optimized XPath implementation using BTrees and indexes and
  the result of a lot of research: 20% (parts of XPath work right now,
  in particular location steps and all axes but the namespace steps,
  and some steps are highly optimized, but lots is missing still)

Two focuses:

  * efficient storage of XML in the ZODB. A document's structure can be
    represented using a single IIBTree plus a few other numbers. 

  * Very fast XPath queries. The structure allows certain operations
    (database style joins in particular, and ancestor/descendant checks) to
    run very quickly. There's a C core that helps with this.

The thing is intended to be an efficient way to store XML documents and
query into them. Basically the undercarriage of a Zope 3 version of
Silva. I don't tend to care much about the other side of Zope and XML,
XML messaging; I care about storing structured information.

One thing it is lacking compared to XMLDocument or ParsedXML is that it
doesn't allow documents to be mutated through a DOM -- documents have
to be replaced as a whole in order to change them. For this use case
(which the Silva editor needs) I imagine a Python DOM can be used (with a save
into Forest when you're "done editing").

Lots of work to be done still but we're making progress.

For XSLT integration I would recommend looking into Ariel Partner's 
XSLT Transform product for Zope 2 and translate that to a Zope 3
design.

Regards,

Martijn