[ZODB-Dev] ZODB-level indexing

Wed Nov 5 03:27:42 EST 2003

There's a French writer, George Perec, that wrote a book called La
Disparition. The US translation, by Gilbert Adair was titled "A Void".

    "It is the story of the disappearance of a man; and in the world
    from where that man disappeared, the letter "E" disappeared as well"

    http://www.themodernword.com/scriptorium/perec.html

So the whole book even in translation does not have the vowel E. In the
epilogue the author explains that this started out as challenge by a
friend but, as he continued, this *constraint* opened up totally new
linguistic highways and byways to the extent that he stopped all his
other work only working on this book.

His words always inspire me when dealing with ridiculous constraints.
The constraint in this case is the excruciatingly slow internet
connection we have and dealing with replication over it.

We built a custom storage that writes pickles out to the filesystem in
addition to storing them in a local FileStorage. Pickles get replicated
by a separate process to remote hosts and are imported there by a thread
spawned by the storage itself when opened. We noticed a couple of things
when building this storage - all are particular to Zope - but I think
the solution lies in ZODB teritory.

Zope's PropertyManager is indiscriminate when changing attributes on
objects. This is somewhat OK *unless* you use ZCatalog for indexing. If
you modify a couple of objects in one transaction with distinct
attributes indexed this indiscriminateness bites you, in that it
unnecessarily bloats the transaction with unnecessary changes to
indexes. But this is not the biggest problem. At the same time a lot of
object invalidations occur causing a lot of unnecessary traffic, in
order to update invalidated objects. One can limit this effect somewhat
by making property manager smart enough to only change attributes if
they actually changed but legitimate changes to properties still cause
a lot of invalidations to catalog indexes.

This can surely not only be bad with a slow connection. I can imagine
that a ZEO cluster might take a perfomance hit in apps where ZCatalog is
integral to the app. The idea I am toying with at the moment is to have
indexing happen at a ZODB storage level similar to what you will get
when using a RDBMS. I don't know if this is even feasible because the
storage only sees pickles and you'd have to unpickle before indexing
becomes possible. Or maybe index just before an object is pickled.
Another way would be to put the Catalog on a separate mount point and
have a hook that indexes invalidated objects as soon as they are
updated. I don't know and still need to explore all options - at the
moment I am just curious if this is concern for anybody else?

-- 
Roché Compaan
Upfront Systems                 http://www.upfrontsystems.co.za