[ZODB-Dev] Indexing objects in a ZODB

Greg Ward gward@cnri.reston.va.us
Tue, 13 Feb 2001 19:55:49 -0500


Hi all --

I'm trying to figure out how to build an index of objects in a ZODB
database.  Clearly there's some code for supporting this -- I've poked
through the SearchIndex and Catalog packages (from the SourceForge CVS of
Andrew's ZODB/ZEO release), but I couldn't really figure out what's going on
there.

Here's the basic idea: we have a handful of singleton classes, instances of
which are top-level database objects (ie. elements of the root dictionary).
Each such "root object" owns collections of our major database objects,
currently stored as BTrees mapping ID strings to objects.  Eg. a singleton
class might be

class Library (Persistent):
    def __init__ (self):
        self.books = BTree.BTree()    # maps ID to Book objects

and its "data class" might be

class Book (Persistent):
    def __init__ (self):
        self.id = None
	self.title = None
	self.year = None
	self.author = None
	# ...

One way to index our collection of Books is to add indeces to Library: one
would map 'title' values to Book objects (or IDs?), another would map 'year'
values, etc.  Thus, we might add the following to Library's constructor:

        self.title_index = BTree.BTree()  # map title values to Book objects
        self.author_index = BTree.BTree() # map author values to Book objects
	self.year_index = ... # you get the idea

A cleaner solution would of course be:

   	self.indeces = {
	    'title': BTree.BTree(),
	    'author': BTree.BTree(),
	    # ...
	}

(I'm not worried about using a plain vanilla dictionary, since I don't
expect to be adding/removing elements to/from this dictionary very often.)

First question: is this the "right way" to do it?  A slight (and I think
inconsequential) variation is to use book IDs (strings instead of Book
instances) as the values of the index BTrees, but that's not fundamental --
it's still a mapping of strings (attribute values) to lists of Python
objects (either strings or instances, doesn't matter since it's just one
more reference to something already in the DB).  So is there a more
efficient way to index this type of data?  I can't see an obvious way to
leverage the intSet extension type included with ZODB, because there's no
obvious int -> Book mapping.

Second question: I propose rolling my own index because I don't understand
the tools provided by ZODB (Catalog, SearchIndex).  Can I use those tools
for this sort of situation?  Is there an explanation somewhere of how to use
them?  (I've tried RTFS'ing, and it didn't help much.)

Thanks --

        Greg