[Zope3-dev] Re: RDFLib and Zope 3

Tue Aug 23 16:00:56 EDT 2005

On Tue, 2005-08-23 at 12:49 -0400, Gary Poster wrote:
> Michel (and anyone else with experience with RDFLib on the list), I  
> recently looked at RDFLib (http://rdflib.net/) and came away (after  
> an hour or so) with a good first impression.

Great.  I've cc:ed Dan Krech, the lead rdflib developer on this mail.
For his benefit I might explain things that you obviously know.

> My biggest disappointment was that, from the perspective of a Zope 3  
> developer, using it alongside other Zope 3 indexes (and other intid- 
> based data structures) meant that I would have to externally convert  
> to and from RDF in order to merge results and convert the RDF URIs to  
> objects. 

Correct.  A specific and important optimization in Zope-style cataloging
is that objects have a cheap unique integer to reduce catalog footprint
and significantly improve result merging and joining.  These intergers
are exposed as a utility component in Zope.

>  It would be much more efficient if I could have an RDF  
> resource class that represented an intid, and even more efficient if  
> I could get IFBTrees back directly from searches that somehow  
> included the intids.  

Yes, this is a problem that needs to be solved, and your suggestion is
one way to solve it.  I've discussed this a few time with Florent at the
paris and EUpy sprints and he had a similar suggestion.  

I'm  uncomfortable with it for a few reasons, 1) because intids are such
a Zope-catalog-optimization specific thing.  I know why they are
exposed, so that catalog results can be efficiently merged, but they
don't have anything to do with RDF, so 2) rdflib can't really change its
interface to accomodate them.  Also, 3) they are backend specific, for
example rdflib has a URI -> integer mapping for its in-meomory and ZODB
backends to reduce footprint, but a sql backend would need no such
integer, you would in fact have to *add* a column to hold that value
just so the data would merge efficiently with a catalog.  This seems
antithetical to Zope 3's philosophy in general as it violates the
concept of not requiring third party libs and data to change themselves
significantly just to work with Zope.  Of course, this isn't a problem
of the catalog, it's a problem in general merging search results from
anywhere.

I'd like to make the optimization available so that searches on a graph
can be efficiently merged with searches on a catalog, but I don't think
it can be done by pushing intids down into rdflib, or for that matter
any other third party component you want to play with the catalog
efficiently.  Perhaps instead of pushing the integers down we could push
URIs up, Zope's cataloging could grown another layer of indirection on
top of intids and provide a URI utility that maps to intids.  Of course
you might object to that for the same reasons I'm objecting to this. ;)
But at least URIs are a well known standard.

Somewhat at right angles to this, I think Zope needs to grow another
search interface, a higher level one that hides all of this integer id
stuff from the user.  I proposed something incomplete along these lines
to the z3labs site, an interface that could aggregate searches across
multiple registered search sources, whether catalogs, rdflib Graphs,
relational databases, remote systems, google, etc.  

With something like this, no need to worry about intersecting two
floating point result sets efficiently, the underlying search framework
performs that optimization if it is available.  Note that the primary
benefit of such an interface is not necessarily merging results across
multiple sources, but instead providing a consistent interface
regardless of the search source.

> Then I could leverage the relationship and  
> keyword capabilities of RDFLib while also merging results efficiently  
> with other index-like data structures in Zope 3.  The intid-specific  
> resources could even have stable URI representations without too much  
> trouble, so that they could be exported and imported with RDFLib, if  
> desired.

Hmm so these resource objects you are suggesting, they would be
persistent objects?  I don't quite have the picture of what you suggest.
Perhaps these resource classes can be managed by a utility?

> Have you thought about that use case?  If one used a variation of  
> your back end that assigned intids to non-intid-based resources like  
> URIs and Literals and stored the relationships via intids, 

One doesn't need a variation, this is exactly the way the in-memory and
ZODB backends work now as an optimization.  But they are internal
details of the implementation of those backends.

> you could  
> store the data as IFBTrees and offer up an API to get "raw" IFBTree  
> results.  Any obvious ways that would be a problem?  Does it feel  
> reasonable to you?  Any suggestions?

Well not any good ones yet, although I know it's an important problem.
I'll have to think about it a bit more.  Do you understand my
objections?  Does anyone else have any suggestions out there?  This is
probably worth solving in the general case, since it's going to come up
anytime you're going to want to merge catalog results with anything.

> I'm generally interested in RDFLib, your use of it, and your hopes  
> for it, if you feel like holding forth. :-)

Great!  And I didn't even have to feed you any kool aid or buy you a
bottle of aquavit. ;)

-Michel