[Zope3-dev] Re: RDFLib and Zope 3
Tres Seaver
tseaver at palladion.com
Tue Aug 23 16:26:42 EDT 2005
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Michel Pelletier wrote:
> On Tue, 2005-08-23 at 12:49 -0400, Gary Poster wrote:
>
>>Michel (and anyone else with experience with RDFLib on the list), I
>>recently looked at RDFLib (http://rdflib.net/) and came away (after
>>an hour or so) with a good first impression.
>
>
> Great. I've cc:ed Dan Krech, the lead rdflib developer on this mail.
> For his benefit I might explain things that you obviously know.
>
>
>>My biggest disappointment was that, from the perspective of a Zope 3
>>developer, using it alongside other Zope 3 indexes (and other intid-
>>based data structures) meant that I would have to externally convert
>>to and from RDF in order to merge results and convert the RDF URIs to
>>objects.
>
>
> Correct. A specific and important optimization in Zope-style cataloging
> is that objects have a cheap unique integer to reduce catalog footprint
> and significantly improve result merging and joining. These intergers
> are exposed as a utility component in Zope.
>
>
>> It would be much more efficient if I could have an RDF
>>resource class that represented an intid, and even more efficient if
>>I could get IFBTrees back directly from searches that somehow
>>included the intids.
>
>
> Yes, this is a problem that needs to be solved, and your suggestion is
> one way to solve it. I've discussed this a few time with Florent at the
> paris and EUpy sprints and he had a similar suggestion.
>
> I'm uncomfortable with it for a few reasons, 1) because intids are such
> a Zope-catalog-optimization specific thing. I know why they are
> exposed, so that catalog results can be efficiently merged, but they
> don't have anything to do with RDF, so 2) rdflib can't really change its
> interface to accomodate them. Also, 3) they are backend specific, for
> example rdflib has a URI -> integer mapping for its in-meomory and ZODB
> backends to reduce footprint, but a sql backend would need no such
> integer, you would in fact have to *add* a column to hold that value
> just so the data would merge efficiently with a catalog. This seems
> antithetical to Zope 3's philosophy in general as it violates the
> concept of not requiring third party libs and data to change themselves
> significantly just to work with Zope. Of course, this isn't a problem
> of the catalog, it's a problem in general merging search results from
> anywhere.
Note that RDBMS-based applicattions will *already* impose such a
requirement, from the moment that you want to join results from the RDF
query to those from any other tables: every non-toy RDBMS in existence
has a "preferred primary key" type, which is an integer, for precisely
the same reasons (to allow efficent joins).
RDBMS best practices insist that "normal" tables have a primary key of
that type, whose value is supposed to remain invisible (or at least
opaque) to humans.
If we want to allow for scalable use of rdflib, I would guess we need to
"promote" the integer ID from "implementation detail" to a first-class
API citizen.
> I'd like to make the optimization available so that searches on a graph
> can be efficiently merged with searches on a catalog, but I don't think
> it can be done by pushing intids down into rdflib, or for that matter
> any other third party component you want to play with the catalog
> efficiently. Perhaps instead of pushing the integers down we could push
> URIs up, Zope's cataloging could grown another layer of indirection on
> top of intids and provide a URI utility that maps to intids. Of course
> you might object to that for the same reasons I'm objecting to this. ;)
> But at least URIs are a well known standard.
They are know, but they are an *infeasible* join key (not only are they
strings, but as arbitrary-length strings with common prefixes, their
sorting semantics are almost worst-case for many join algorithms.)
<snip>
>>Have you thought about that use case? If one used a variation of
>>your back end that assigned intids to non-intid-based resources like
>>URIs and Literals and stored the relationships via intids,
>
>
> One doesn't need a variation, this is exactly the way the in-memory and
> ZODB backends work now as an optimization. But they are internal
> details of the implementation of those backends.
As I argue above, I believe this to be a false encapsulation.
>>you could
>>store the data as IFBTrees and offer up an API to get "raw" IFBTree
>>results. Any obvious ways that would be a problem? Does it feel
>>reasonable to you? Any suggestions?
>
>
> Well not any good ones yet, although I know it's an important problem.
> I'll have to think about it a bit more. Do you understand my
> objections? Does anyone else have any suggestions out there? This is
> probably worth solving in the general case, since it's going to come up
> anytime you're going to want to merge catalog results with anything.
>
>
>>I'm generally interested in RDFLib, your use of it, and your hopes
>>for it, if you feel like holding forth. :-)
>
>
> Great! And I didn't even have to feed you any kool aid or buy you a
> bottle of aquavit. ;)
Now if I only *liked* carawy-in-a-bottle. ;)
Tres.
- --
===================================================================
Tres Seaver +1 202-558-7113 tseaver at palladion.com
Palladion Software "Excellence by Design" http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFDC4aB+gerLs4ltQ4RAmiaAJ9OLuM1D73UZF8pMiKMffO64mtKhwCghOFK
swFsBJESA0h7CCTCFOi9AXw=
=2SZA
-----END PGP SIGNATURE-----
More information about the Zope3-dev
mailing list