[Zope3-dev] Re: RDFLib and Zope 3

Tue Aug 23 16:26:42 EDT 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michel Pelletier wrote:
> On Tue, 2005-08-23 at 12:49 -0400, Gary Poster wrote:
> 
>>Michel (and anyone else with experience with RDFLib on the list), I  
>>recently looked at RDFLib (http://rdflib.net/) and came away (after  
>>an hour or so) with a good first impression.
> 
> 
> Great.  I've cc:ed Dan Krech, the lead rdflib developer on this mail.
> For his benefit I might explain things that you obviously know.
> 
> 
>>My biggest disappointment was that, from the perspective of a Zope 3  
>>developer, using it alongside other Zope 3 indexes (and other intid- 
>>based data structures) meant that I would have to externally convert  
>>to and from RDF in order to merge results and convert the RDF URIs to  
>>objects. 
> 
> 
> Correct.  A specific and important optimization in Zope-style cataloging
> is that objects have a cheap unique integer to reduce catalog footprint
> and significantly improve result merging and joining.  These intergers
> are exposed as a utility component in Zope.
> 
> 
>> It would be much more efficient if I could have an RDF  
>>resource class that represented an intid, and even more efficient if  
>>I could get IFBTrees back directly from searches that somehow  
>>included the intids.  
> 
> 
> Yes, this is a problem that needs to be solved, and your suggestion is
> one way to solve it.  I've discussed this a few time with Florent at the
> paris and EUpy sprints and he had a similar suggestion.  
> 
> I'm  uncomfortable with it for a few reasons, 1) because intids are such
> a Zope-catalog-optimization specific thing.  I know why they are
> exposed, so that catalog results can be efficiently merged, but they
> don't have anything to do with RDF, so 2) rdflib can't really change its
> interface to accomodate them.  Also, 3) they are backend specific, for
> example rdflib has a URI -> integer mapping for its in-meomory and ZODB
> backends to reduce footprint, but a sql backend would need no such
> integer, you would in fact have to *add* a column to hold that value
> just so the data would merge efficiently with a catalog.  This seems
> antithetical to Zope 3's philosophy in general as it violates the
> concept of not requiring third party libs and data to change themselves
> significantly just to work with Zope.  Of course, this isn't a problem
> of the catalog, it's a problem in general merging search results from
> anywhere.

Note that RDBMS-based applicattions will *already* impose such a
requirement, from the moment that you want to join results from the RDF
query to those from any other tables:  every non-toy RDBMS in existence
has a "preferred primary key" type, which is an integer, for precisely
the same reasons (to allow efficent joins).

RDBMS best practices insist that "normal" tables have a primary key of
that type, whose value is supposed to remain invisible (or at least
opaque) to humans.

If we want to allow for scalable use of rdflib, I would guess we need to
"promote" the integer ID from "implementation detail" to a first-class
API citizen.

> I'd like to make the optimization available so that searches on a graph
> can be efficiently merged with searches on a catalog, but I don't think
> it can be done by pushing intids down into rdflib, or for that matter
> any other third party component you want to play with the catalog
> efficiently.  Perhaps instead of pushing the integers down we could push
> URIs up, Zope's cataloging could grown another layer of indirection on
> top of intids and provide a URI utility that maps to intids.  Of course
> you might object to that for the same reasons I'm objecting to this. ;)
> But at least URIs are a well known standard.

They are know, but they are an *infeasible* join key (not only are they
strings, but as arbitrary-length strings with common prefixes, their
sorting semantics are almost worst-case for many join algorithms.)

<snip>

>>Have you thought about that use case?  If one used a variation of  
>>your back end that assigned intids to non-intid-based resources like  
>>URIs and Literals and stored the relationships via intids, 
> 
> 
> One doesn't need a variation, this is exactly the way the in-memory and
> ZODB backends work now as an optimization.  But they are internal
> details of the implementation of those backends.

As I argue above, I believe this to be a false encapsulation.

>>you could  
>>store the data as IFBTrees and offer up an API to get "raw" IFBTree  
>>results.  Any obvious ways that would be a problem?  Does it feel  
>>reasonable to you?  Any suggestions?
> 
> 
> Well not any good ones yet, although I know it's an important problem.
> I'll have to think about it a bit more.  Do you understand my
> objections?  Does anyone else have any suggestions out there?  This is
> probably worth solving in the general case, since it's going to come up
> anytime you're going to want to merge catalog results with anything.
> 
> 
>>I'm generally interested in RDFLib, your use of it, and your hopes  
>>for it, if you feel like holding forth. :-)
> 
> 
> Great!  And I didn't even have to feed you any kool aid or buy you a
> bottle of aquavit. ;)

Now if I only *liked* carawy-in-a-bottle. ;)

Tres.
- --
===================================================================
Tres Seaver          +1 202-558-7113          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDC4aB+gerLs4ltQ4RAmiaAJ9OLuM1D73UZF8pMiKMffO64mtKhwCghOFK
swFsBJESA0h7CCTCFOi9AXw=
=2SZA
-----END PGP SIGNATURE-----