[ZODB-Dev] Searching/wo/Zope

Tim Peters tim at zope.com
Wed Jan 4 23:05:32 EST 2006


[Tamas Hegedus]
> Please do not forget that I am not a real programmer but a consumer:

That's OK -- in return, please don't forget that you're posting to a ZODB
developer's list ;-).

> 0. I accept if the policy is making ZODB just for Zope.

That's the only solid reason Zope Corp has to _pay_ for ZODB development.
Zope pays the bills here, and ZODB is supporting infrastructure for Zope.
ZODB is an open project, though, and others are free to contribute-- or even
to pay others for --ZODB work that doesn't directly support a glaring Zope
need.

> Then OK. But if there is bigger potential inside it...

Then what, specifically?  Nobody works on something unless they want to,
and/or are paid to.  It's not a matter of cheerleading, it's a matter of
someone doing the work.

For example, I'm sure that someone so motivated could produce an add-on
package for ZODB supporting some vision of automatic index/catalog/search.
Most people who are likely to contribute time and/or money to do such a
thing are likely to do it in the context of Zope, though, simply because
Zope pays their bills too.  For example, Dieter Maurer has done some hard
work on speeding searches (see IncrementalSearch and IncrementalSearch2 at:

    http://www.dieter.handshake.de/pyprojects/zope

).

A notable exception is IndexedCatalog:

    http://www.async.com.br/projects/IndexedCatalog/

which is independent of Zope.  You said before that you thought that wasn't
active, and it indeed doesn't look like it's had a release recently.  That
could be because it's already perfect ;-) -- or it could be that there's not
a large enough community who wants it to support ongoing development.  I
don't know.

> (By the way: I think it is great and big, and I would like to use it.)
> To formulate this on a more realistic way: it seems for me that there
> is no potential to take care about this extra project outside of Zope
> AND/OR it would not be good for Zope developers to have it as an
> easy-to-use stand alone module (maybe some business policy?).

Not sure I followed that.

> 1. """That's usually viewed as an application-level problem, and it's up
> to applications to solve it in ways best suited for their particular
> needs.""" If I translate this for myself, if I understand well: I am very
> happy that RDBMSs does not say this, and I can search them not only by
> primary keys; I am happy that I do not have to implement something
> similar to SQL as it is not considered as "application level problem".

A relational database forces you to slam all your data into uniform tables,
regardless of whether that's a natural fit.  When all you have is uniform
tables, then it's relatively easy to define uniform operators for crawling
over those tables -- that's what SQL is all about.

An object database is more of a general graph structure, and an
application's idea of "search" can be correspondingly semantically richer
from the start -- or even irrelevant, if the object graph is constructed
from the start to make traversals of potential interest follow the natural
graph pointers.  What's the analogue to SQL in this quite different view of
the world?  Well, there isn't a standard accepted vision for that.  That's
what makes it the app's problem.  These are tradeoffs.  Zope's assorted
indices and catalogs _probably_ capture some notion of "search" close to
what you're after.

> 2. BTrees: I could not find any 'built-in' possibility in the docs, just
> the 'primary keys'. If I check the OOBTree, etc, it just give
> 'difference', 'intersection', 'union'. I do not see to do full text
> search or field search on BTrees. Do I miss something???

BTrees map keys to values.  The keys are always maintained in sorted order,
and it's both dead easy and efficient to do range searches over a BTree's
key space.  That's what's built in.

> (I do not think as if I would than you would not call the problem
> "application" level problem).

It depends on the app.  I gave SpamBayes as a concrete example of a real app
where the builtin abilities of BTrees do all the searching the app requires
(and it's not an accident that SpamBayes was designed that way -- just as it
wouldn't have been an accident if an RDMS fan had designed SpamBayes to work
directly with simple RD tables).

> 3. I can not build up another database from the ZODB as I am not a
> developer.

Do you use Python?  I'm at a bit of a loss to figure out how you wound up
posting here if you're _not_ a Python programmer.  It could be that ZODB is
much more general than I thought ;-), but I didn't think non-programmers
would have any use for it.

> But I think you formulated this not the best way: I think you do not
> build the SB database OUT of ZODB's BTrees, I think you just build
> up indexes from the BTrees and you implement searches on your indexes
> that points back to the BTrees.

I suppose you could think of it that way, but I designed SpamBayes and
that's not how I thought of it.  I thought of it in terms of abstract
mappings, then designed the main algorithms to work directly with BTrees.
ZODB supplies persistent BTrees, and that's all SB needs.

> => If you build up a new database why do you use ZODB?

Of course there are many possible reasons.  SpamBayes has actually been
using Berkeley DB by default so far, and is moving to ZODB primarily because
we're sick of frequent database corruption problems with BDB, and partly
because it was designed to use ZODB to begin with ;-)

A more general good reason to use ZODB is that you want to use Python, and
the semantics ("meaning") of your data are naturally modeled by a graph of
Python objects.  If the semantics are more naturally captured by a
spreadsheet, ZODB is probably a more dubious choice.

> => If you just build indexes from the BTrees, the following protocol
> works for me and you can suggest?

Not sure I'm following.  I can suggest what?

> 1. walk trough on your BTree taking each object

A BTree is a collection of <key, value> pairs, and unsure what "object"
means here.

> 2. with an external indexing application build the index (on one or more
> fields, or full text)
> 3. search in your index that returns with the 'primary key' of objects
> in the ZODB
> 4. get the objects from the ZODB via the 'primary keys' from the prev
> step. ???

OK, now I'm sure not following.  You appear to be assuming much more
structure than a plain BTree supports on its own, and in fact BTrees don't
really _appear_ to have anything to do with what you're saying.  If you
think _your_ objects have such things as "fields" and "primary keys", then
that's part of your objects' design and your objects' implementations --
objects don't come with such notions built in.  It sounds like you have RDMS
tables in mind, and are forcing object language on top of them.

If so, that's fine -- it's legitimate to do so.  It sounds like you'd be
happier then with an RDMS, though (under the inference that you _think_ in
terms of tables rather than in terms of objects -- and, for all I know, that
may be thoroughly appropriate for your apps (of which I know nothing)).

Anyway, the bottom line remains that there are no current plans to add
index/catalog/search code to the ZODB distribution.  The only way that can
change is if someone contributes the work.



More information about the ZODB-Dev mailing list