[ZODB-Dev] Re: Incremental Search Engine (was: RE: [BTrees] considerable optimization potential in"intersection")

Casey Duncan casey at zope.com
Mon Aug 9 15:57:41 EDT 2004


On Mon, 9 Aug 2004 19:39:31 +0200
Dieter Maurer <dieter at handshake.de> wrote:

> Casey Duncan wrote at 2004-8-9 10:08 -0400:
> >On Sat, 7 Aug 2004 14:27:31 +0200
> > ...
> >>   The principle for an and query of subqueries q1, ... qn
> >>   looks like:
> >> 
> >> 	# determine a hit >= candidate
> >> 	candidate = q1.getCandidate(candidate) # hit >= candidate
> >>   	while True:
> >> 	    foreach q:
> >> 	        (accepted, candidate) = q.classify(candidate)
> >> 	        if accepted == CANDIDATE: break # new candidate > old
> >> 	        candidate if accepted == END: return END # no more hits
> >> 	        # accepted == HIT; candidate unchanged
> >> 	    if accepted == CANDIDATE: break # restart with new candidate
> >> 	    # accepted == HIT; candidate is the next hit
> >> 	    return candidate
> >>>   This means: the first query provides a candidate.
> >
> >The candidate is a single record or a set?
> 
> A single document id.
> 
> >>   All subqueries are asked whether they can accept this candidate.
> >>   They may answer "END" (no more hits), "HIT" (candidate accepted),
> >>   "CANDIDATE" (a new candidate larger than the former one).
> >>   When all subqueries accept the candidate, it is the next hit.
> >
> >The principal of the algorithm I think makes sense to me, however I
> >am confused by the pseudocode above. What does it mean for a
> >candidate to be greater than another?
> 
> It has a larger document id (document ids are integers).

Ahh, that makes more sense given your increasing docid explanation
below. So I assume a higher candidate will always be one that has not
been seen, and the comparison is a cheap way to test that rather than
build a set of seen docids? Why does that require that docids are handed
out in increasing order? If during a query, it always iterated
candidates in increasing order (which is easy given they are kept in
order by docid in the catalog and index btrees) wouldn't that suffice?
 
> >Also, I'm confused by the third line from the end. How does breaking
> >there restart? Is there an omitted outer loop?
> 
> You are right! The "break" should be a "continue".

That makes sense.
 
> >Also the last line does a return, would that be a yield (i.e., would
> >this be a generator?)
> 
> The "return" is correct.
> 
> The code above may be called from an outer incremental search
> which does not request successive hits but skips large
> hit ranges (by passing it an appropriate candidate).

Ok, it would be useful to see an example of the outer and inner query
mechanism together for me to truely understand it. I suspect it would
answer my questions above.

-Casey



More information about the ZODB-Dev mailing list