[ZODB-Dev] Fast * wildcard search and word-part indexing + B-tree interval search.

Casey Duncan casey at zope.com
Wed Sep 17 13:36:53 EDT 2003


A possible roadblock to optimizing this is that there is currently no BTree 
variant expressly for string keys. There's a pair of BTree types for integer 
keys and a pair for arbitrary (comparible) Python object keys. The comparison 
operations you outlined are doable with object keys, but what optimizations 
you can get may be limited without a specific string-key btree variant. Were 
I to suggest the latter, I would expect a well-deserved beating from Tim and 
anyone else currently in line for maintenance of the btree code ;^)

It might be worthwhile to try an implementation that uses the existing btree 
range selection capabilities and see how it performs. I strongly suspect that 
performance for large corpuses will be limited mostly by disk IO speed. So an 
avenue to think about might be an algorithm which loads the fewest values 
into memory in the common case.

-Casey

On Tuesday 16 September 2003 05:49 am, Vladimir Voznesensky wrote:
> One addition to estimation function.
> 
> > Estimation function for [A,B) interval and ab?cd template is the 
> > following:
> > 1. If first 5 letters of B<"abacd" or of A>"abzcd", all the keys in 
> > the interval does not match ab?cd.
> > 2. If first 5 letters of A=of B="ab<some letter>cd", all the keys of 
> > the interval do match.
> > 3. Otherwise, possibly some of the keys do match, some-do not, and we 
> > must consider sub-division of the interval.
> 
> Shure, we can exclude intervals of the form:
> "ab<X>cd"<A[0:5] and B[0:5]<"ab<X+1>cd" where <X> is some character and 
> <X+1> -- it's successor.
> 
> So, estimation function for [A,B) interval and ab?cd template becomes 
> the following:
> 1. If first 5 letters of B<"abacd" or of A>"abzcd", all the keys in the 
> interval does not match ab?cd.
> 2. If first 5 letters of A=of B="ab<some letter>cd", all the keys of the 
> interval do match.
> 3. If first 5 letters of A>"ab<X>cd" and of B<"ab<X+1>cd", all the keys 
> in the inteval do not match.
> 4. Otherwise, possibly some of the keys do match, some-do not, and we 
> must consider sub-division of the interval.
> 
> Dear Christian, who is the maintainer of btree code?
> I'm interested in writing btree interval searching subroutine and 
> estimation functions hook in C by myself.
> I'm not interested in writing estimation functions specifically for 
> character strings (I follow another aims), but it seems that many folks are.
> I can advise and help to write string estimation functions (I've wrote 
> them for Berkeley DB) and appropriate indices for Zope IndexedCatalog.
> I have no large experience in Zope and (Python+C) development.
> I'm trying to find co-llaboratives and possible consumers of my work.
> 
> Farewell.
> 
> VV
> 
> 
> _______________________________________________
> For more information about ZODB, see the ZODB Wiki:
> http://www.zope.org/Wikis/ZODB/
> 
> ZODB-Dev mailing list  -  ZODB-Dev at zope.org
> http://mail.zope.org/mailman/listinfo/zodb-dev
> 




More information about the ZODB-Dev mailing list