# [Zope-CVS] CVS: Products/ZCTextIndex - Index.py:1.1.2.13

**Tim Peters
**
tim.one@comcast.net

*Fri, 3 May 2002 01:16:11 -0400*

Update of /cvs-repository/Products/ZCTextIndex
In directory cvs.zope.org:/tmp/cvs-serv15170
Modified Files:
Tag: TextIndexDS9-branch
Index.py
Log Message:
doc_term_weight() and its use in _get_frequencies(): The former
returned a scaled int, roughly 256 * true_value. The latter then squared
it, giving roughly 65536 * true_value**2. It won't take an implausible
number of those before the sum overflows a signed 32-bit int (each int
is an artificial factor of 2**16 times "too big"). So changed the former
to return plain old true_value as a float, scaling only for storing and
*after* the sqrt of the sum is taken.
=== Products/ZCTextIndex/Index.py 1.1.2.12 => 1.1.2.13 ===
for wid in wids:
d[wid] = d.get(wid, 0) + 1
- Wsquares = 0
+ Wsquares = 0.
freqs = []
for wid, count in d.items():
f = doc_term_weight(count)
- Wsquares += f ** 2
- freqs.append((wid, f))
- return freqs, int(math.sqrt(Wsquares))
+ Wsquares += f * f
+ freqs.append((wid, scaled_int(f)))
+ return freqs, scaled_int(math.sqrt(Wsquares))
def _add_wordinfo(self, wid, f, docid):
try:
@@ -164,7 +164,7 @@
def doc_term_weight(count):
"""Return the doc-term weight for a term that appears count times."""
# implements w(d, t) = 1 + log f(d, t)
- return scaled_int(1 + math.log(count))
+ return 1. + math.log(count)
def query_term_weight(term_count, num_items):
"""Return the query-term weight for a term,