[Zope3-dev] The Proposed Catalog Sprint Agenda is available

Martijn Faassen faassen@vet.uu.nl
Fri, 15 Feb 2002 02:21:27 +0100


Hi there,

I'm not claiming I actually follow most of what's going on, but
this touches on a problem I'm currently dealing with; indexing the
result of XPath queries against ParsedXML Documents. Short description
of the problem:

An XPath query returns a list of XML nodes in the ParsedXML DOM tree.

I've implemented something called NodePath which has various schemes
to create more-or-less robust paths (which can be used in URLs) to
particular nodes, and resolve them back again.

XPath by itself isn't too fast (on a separate development track I'm
working on a new XML storage system based on the ZODB and BTrees that
should alleviate this, but that needs quite a bit of cooking). So, I'd
like to keep an index of common XPath queries.

So, ParsedXML docs can be indexed using a particular XPath query, and then
later on I can ask the index for all nodes (or in fact node paths) that 
were the result of that query, fast. If multiple documents are indexed,
I'll get nodes from all the documents (if they had matching nodes in
the first place).

I've been trying to wrap my head around how the Zope2 catalog works and
whether I could create a PluginIndex, but my limited understanding of
the thing make me unsure. It seems in this case while I index a single
document I could get a whole list of subobjects (the nodes) actually
indexed, and while a full text index does something similar, in
the latter case the interest is more in whether there's a match, while
in my case I'm interested in actually what nodes can be retrieved.

So I'm currently considering rolling my own.

However, on the longer term:

> 2) it would be nice if we could get relationship and index data from
> different black box data jars (so RDBMS could be used and all kinds of other
> interesting possibilities)

I'd like to add some form of XML storage (in the ZODB presumably) as
another kind of black box data jar for consideration, along with the
previous use case. I'm not sure how much that contributes to the discussion,
but I thought I'd just mention it as it seems a related problem.

Regards,

Martijn