[ZODB-Dev] Fast * wildcard search and word-part indexing + B-tree interval search.

Vladimir Voznesensky vovic at smtp.ru
Mon Sep 15 12:03:44 EDT 2003


  Hello.

As you (kiko) has mentioned above, there is a need for fast wildcard search.
Shure, every specific task has it's own optimal searching strategy.
But some tricks are possible that can decrease the number of records to 
be traversed.
My suggestion is the following:
Let's put every word and tail of word into our b-tree.
For example, word "mother" generates the following keys:
mother
other
ther
her
er
r
(There would be <number of words>*<number of letters in word> keys in 
our b-tree.)
Every ? and * wildcard template consists of ? wildcard templates 
interleaved with *s.
For example, *ab?cd*e?f?g consists of ?-templates:
ab?cd
e?f?g
At the first step, we choose ?-template for interval estimation search.
For example, ab?cd.
Better to choose ?-template that have minimum number of matches in our 
b-tree.
(There can be different techniques to make this estimations.)
At the second step, we iterate over records that match that ?-template 
(using interval estimation functions) and get the appropriates.
Estimation function for [A,B) interval and ab?cd template is the following:
1. If first 5 letters of B<"abacd" or of A>"abzcd", all the keys in the 
interval does not match ab?cd.
2. If first 5 letters of A=of B="ab<some letter>cd", all the keys of the 
interval do match.
3. Otherwise, possibly some of the keys do match, some-do not, and we 
must consider sub-division of the interval.

Farewell.

VV




More information about the ZODB-Dev mailing list