[ZODB-Dev] Finding objects by attribute value?

Thu Jan 6 11:45:49 EST 2005

[Christian Robottom Reis]
> Do you think a server-side indexing mechanism with query caching could be
> a significant performance enhancement? I'm sure you don't *know*, but
> your opinion would be interesting to have here.

Jeez, there are so many parts to this.  To the "indexing" part, sure <wink>.
To the "query caching" part, it depends entirely on app specifics -- caches
are a pure loss unless clients actually reuse queries, and a net loss unless
clients reuse them often enough to more than overcome the extra overheads
and fragility caused by trying to cache (caching competes for RAM, I/O
bandwidth, and processor cycles too, and a ZEO server machine may not have
any of those to spare right now -- depending on how big the database is, how
many clients it's serving, how active the clients are, how beefy the
hardware is, etc).

To the "server side" part, there are can be some real attractions just in
centralizing indexing activities.  Despite all the features BTrees implement
to support scalability, when a bunch of clients are hammering on an indexing
tree simultaneously, write conflicts can still be a major bottleneck in ZODB
3.3 (== MVCC here, taking read conflicts out of it).  One comprehensible way
to address that is to feed index-update requests to a centralized process,
where the latter can sort out potential conflicts _before_ trying to commit
changes.  Leaving that to "automatic" conflict resolution (ACR) has inherent
limitations; e.g., ACR is limited to one bucket at a time, and can't deal at
all with conflicts that involve bucket splits.  If a single process were
doing the mutations, ACR wouldn't get involved; and if that process were on
the server box, it would certainly have quickest-possible access to all the
pieces that go into an index.

After that, I suspect it gets a little complicated <wink>.

>> -- it doesn't have an expectation that it *can* unpickle objects of
>> user-defined classes, so rarely ever tries to (user-defined conflict
>> resolution methods are an exception, and the only one I can think of).
> [...]
>> to accept and service brute-force search requests.  You'd also need to
>> ensure that the code implementing your classes was available to this
>> program (not normally needed on a ZEO server box)

> Except in the case of conflict resolution, right?

Yes.  I have the impression (right or wrong) that user-defined conflict
resolution methods are rare, though (of course BTree conflict resolution
machinery is always present on a ZEO server).

> I imagine that if we didn't handle brute-force searching in the server
> and indexing was implemented, then all that would be needed on the server
> was the code implementing the indexes themselves; the actual objects
> returned by a query would go on being opaque to the server.

I'm not really picturing what you have in mind.  From all I've seen, "the
problems" with indexing/cataloging ZODB objects have 90% to do with dealing
with high rates of concurrent index/catalog mutation, and little to do with
accessing (just reading) indices/catalogs.  It would help here to try to
define "the problem" you're hoping to address.