[ZODB-Dev] B-tree enhancement : interval search.

Thu Sep 4 18:55:38 EDT 2003

On Thu, Sep 04, 2003 at 03:48:58PM +0400, Vladimir Voznesensky wrote:
> Imagine, that we have a B-tree of N-letters words sorted in alphabetical 
> order.
> Let's we need to find all the strings satisfying a specific template 
> consisting of letters and "?" wildcards.
> For example, a?b or ?c?.
> The traditional approach for a?b is to find the first a?? word and 
> iterate sequentially while first letter is a.
> The traditional approach for ?c? is to iterate over all of words (or 
> call [a-z]c searching for every letter in the alphabet).
> Is it optimal? No.
> In many cases we can eliminate some pages and some parts of pages, for 
> example, [abc,aca] interval needs not to be checked for a?b. We can use 
> real distribution of words in specific B-tree to accelerate searching.

This is an awesome feature; we could *really* use this in IndexedCatalog
to speed up string queries, which is one of the top requests we get from
our users.

> I'm a new fellow in Zope. I'm ready to realize such an extention by 
> myself, but I do not understand, how to organize my interaction with  
> Zope folks: how to make feature requests, how to propagate my code, etc.
> As for Berkeley DB, SleepyCat Software disagreed to add my feature to 
> their source tree, because their customers are not interested in such 
> extention.

The main problem you'll have with the ZODB team is that everybody's
always so damned busy <wink>. I found the best way to work with them is
to persist -- if you don't get an answer, wait a bit, and repost. While
coding, follow the standards of the code around you -- you'll see there
are about 4000 lines of C code (and about 2000 of python) in the BTrees
package that should provide a good guide for whatever new implementation
you start.

You might want to post a short proposal indicating some basic choices
-- class names, file names, module organization -- to get a feel for
what sort of conventions people prefer, and the level of acceptance your
feature garners.

Having said all that, if you *do* have a hard time getting code
integrated, I don't imagine it would be hard to write an add-on for the
BTrees package in the ZODB distribution and ship it separately.

> My algorythm accelerates searching in many cases from times to 
> magnitudes (depending on template and data variations) comparing to 
> traditional approach of standard Berkeley DB API. I'd like to have and 
> use the same thing in ZODB.

Question: does the template necessarily have a fixed number of
characters, or is something like the * wildcard supported too?

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL