[Zope-CVS] CVS: Products/ZCTextIndex - Index.py:1.1.2.13

Tim Peters tim.one@comcast.net
Fri, 3 May 2002 01:16:11 -0400


Update of /cvs-repository/Products/ZCTextIndex
In directory cvs.zope.org:/tmp/cvs-serv15170

Modified Files:
      Tag: TextIndexDS9-branch
	Index.py 
Log Message:
doc_term_weight() and its use in _get_frequencies():  The former
returned a scaled int, roughly 256 * true_value.  The latter then squared
it, giving roughly 65536 * true_value**2.  It won't take an implausible
number of those before the sum overflows a signed 32-bit int (each int
is an artificial factor of 2**16 times "too big").  So changed the former
to return plain old true_value as a float, scaling only for storing and
*after* the sqrt of the sum is taken.


=== Products/ZCTextIndex/Index.py 1.1.2.12 => 1.1.2.13 ===
         for wid in wids:
             d[wid] = d.get(wid, 0) + 1
-        Wsquares = 0
+        Wsquares = 0.
         freqs = []
         for wid, count in d.items():
             f = doc_term_weight(count)
-            Wsquares += f ** 2
-            freqs.append((wid, f))
-        return freqs, int(math.sqrt(Wsquares))
+            Wsquares += f * f
+            freqs.append((wid, scaled_int(f)))
+        return freqs, scaled_int(math.sqrt(Wsquares))
 
     def _add_wordinfo(self, wid, f, docid):
         try:
@@ -164,7 +164,7 @@
 def doc_term_weight(count):
     """Return the doc-term weight for a term that appears count times."""
     # implements w(d, t) = 1 + log f(d, t)
-    return scaled_int(1 + math.log(count))
+    return 1. + math.log(count)
 
 def query_term_weight(term_count, num_items):
     """Return the query-term weight for a term,