[Zope-CVS] CVS: Products/ZCTextIndex/tests - testZCTextIndex.py:

Guido van Rossum guido@python.org
Mon, 6 May 2002 09:13:45 -0400

Update of /cvs-repository/Products/ZCTextIndex/tests
In directory cvs.zope.org:/tmp/cvs-serv15806/tests

Modified Files:
      Tag: TextIndexDS9-branch
Log Message:
This is the new Index.py, which stores scaled_int(w(d, t) / W(d)) in
_wordinfo[t], so that the inner loop in search() can be replaced by a
single constant scale factor, applied by weightedUnion.

With a small change to _get_wdt() and one change in ZCTest (because
the results are now scaled differently), this now passes the test
suite, so it can't be all bad. :-)

On a corpus of 73420 email messages, a search for an uncommon term is
now immeasurably fast (less than 10 msec); a search for a very common
term (occurring 39379 times) takes 250 msec, another even more common
term (52286 hits) takes 320 msec, and combining these two takes about
the sum of those times.

Tim brought up some good issues with the new approach, so it needs to
be tweaked more, especially to avoid losing too much precision: the
values stored in _wordinfo are small, which is good since it makes for
small pickles, but often very small, which gives us very little to
work with (and for collections with large documents this will be even
worse).  But first I'd like to work on refactoring how the query
engine and the query parser connect to the index.

=== Products/ZCTextIndex/tests/testZCTextIndex.py => ===
                 d[doc] = scaled_int(score)
             for doc, score in r:
-                score = scaled_int(float(score) / wq)
+                score = scaled_int(float(score / SCALE_FACTOR) / wq)
                 self.assert_(0 <= score <= SCALE_FACTOR)
                 eq(d[doc], score)