[Zope-CVS] CVS: Products/ZCTextIndex - SetOps.py:1.1

Tim Peters tim.one@comcast.net
Tue, 14 May 2002 20:05:44 -0400


Update of /cvs-repository/Products/ZCTextIndex
In directory cvs.zope.org:/tmp/cvs-serv18209

Added Files:
	SetOps.py 
Log Message:
Squash bug duplication by moving the clever mass-union and mass-
intersection gimmicks into their own functions with their own test suite.

This turned up two bugs:

1. The mass weighted union gimmick was incorrect when passed a list with
   a single mapping.  In that case, it neglected to multiply the mapping
   by the given weight.

2. The underlying weighted{Intersection, Union} code does something crazy
   if you pass it weights less than 0.  I had vaguely hoped to be able
   to subtract scores by passing 1 and -1 as weights, but this doesn't
   work.  It's hard to say exactly what it does then.  The line
       weightedUnion(IIBTree(), mapping, 1, -2)
   seems to return a mapping with the same keys, but *all* of whose
   values are -2, regardless of the original mapping's values.


=== Added File Products/ZCTextIndex/SetOps.py ===
##############################################################################
#
# Copyright (c) 2001, 2002 Zope Corporation and Contributors.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
# Version 2.0 (ZPL).  A copy of the ZPL should accompany this distribution.
# THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
# WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
# FOR A PARTICULAR PURPOSE
#
##############################################################################

"""SetOps -- Weighted intersections and unions applied to many inputs."""

from BTrees.IIBTree import IIBTree, weightedIntersection, weightedUnion

from Products.ZCTextIndex.NBest import NBest

def mass_weightedIntersection(L):
    "A list of (mapping, weight) pairs -> their weightedIntersection IIBTree."
    if not L:
        return IIBTree()
    # Intersect with smallest first.
    L = L[:]    # don't mutate the caller's L
    L.sort(lambda x, y: cmp(len(x[0]), len(y[0])))
    x, w = L[0]
    dummy, result = weightedUnion(IIBTree(), x, 1, w)
    for x, w in L[1:]:
        dummy, result = weightedIntersection(result, x, 1, w)
    return result

def mass_weightedUnion(L):
    "A list of (mapping, weight) pairs -> their weightedUnion IIBTree."
    if not L:
        return IIBTree()
    if len(L) == 1:
        # Have to do a union in order to get the input's values
        # multiplied by the weight.
        x, weight = L[0]
        dummy, result = weightedUnion(IIBTree(), x, 1, weight)
        return result
    # Balance unions as closely as possible, smallest to largest.
    assert len(L) > 1
    merge = NBest(len(L))
    for x, weight in L:
        merge.add((x, weight), len(x))
    while len(merge) > 1:
        # Merge the two smallest so far, and add back to the queue.
        (x, wx), dummy = merge.pop_smallest()
        (y, wy), dummy = merge.pop_smallest()
        dummy, z = weightedUnion(x, y, wx, wy)
        merge.add((z, 1), len(z))
    (result, weight), dummy = merge.pop_smallest()
    return result