[ZODB-Dev] CMS on top of ZODB

Wed, 19 Feb 2003 00:10:49 +0100

Dieter Maurer wrote:
[snip description]
> This sounds very interesting.

Thanks! I'd be more than willing to talk more about it. Practical use of it
is some time away, depending on the amount of time I have (or contributions
of course). Let me talk about some of the problems/drawbacks:

  * documents are only mutable as a whole; you can't change parts of it
    using a DOM (i.e. it's read only)

  * the node id scheme which is essential to the whole exercise runs out of
    integers rather quickly. This means it can deal with documents of 
    the size of a Shakespeare play, but the Old Testament is just a trifle
    too big causing it to run out of ids. I'd like to switch to 64 bit ints 
    for this (but of course IIBTrees don't work then..) or consider other
    solutions. The node id scheme also makes rather wonderful things possible
    though, such as single IIBtree storage for complete document structure
    (with a few more ints per level in the tree), and constant time
    checks to determine ancestor/descendant relationships between nodes.

    Oh, and 'joins' of ancestors/descendants don't even have to hit the
    ZODB to do their work, except to simply get the list of ancestor ids and
    descendant ids out. The goal is an xpath implementation with only a
    minimum of actual tree walking; mostly it should be a fast indexed
    database join.

  * Incompleteness. XPath location steps basically work (and the core is
    very fast) but XPath predicates only have the beginnings of support.
    Adding more functionality to this bare bones engine would slow 
    it down without some more complicated query optimization work, which
    I'm currently considering. Multidocument Xpath queries are also something
    that still needs to be designed.

    Also some XML stuff is still missing; simple things to add like 
    comment nodes. Entities and DTDs are not being looked at right now.
    It is fully namespace aware however and does the basic stuff like
    elements, attributes and text nodes.

Nice features I haven't mentioned yet is the possibility to read in
documents through sax (could use some performance optimization work though;
the node id indexing slows it down), and the possibility to output documents 
as sax events again. You can pass that into the Python sax library's
class that can generate XML again from sax events, which is neat.
Oh, and XPath already works going through the Readonly DOM; I simply use
PyXML's Xpath. This is slow and there may be bugs though.

Anyway, I'd love to get some help on this. :) Python needs a good
XML database..

Regards,

Martijn