[Zope-CVS] CVS: Products/ZCTextIndex - QueryEngine.py:1.1.2.4

Tim Peters tim.one@comcast.net
Sun, 5 May 2002 22:04:52 -0400


Update of /cvs-repository/Products/ZCTextIndex
In directory cvs.zope.org:/tmp/cvs-serv6102

Modified Files:
      Tag: TextIndexDS9-branch
	QueryEngine.py 
Log Message:
When doing multiple ORs, do a balanced merge.  For example, if we had
8 result sets with 1 element each, and they were pairwise disjoint, we
were doing:

    merge 1 with 1
    merge 1 with 2
    merge 1 with 3
    merge 1 with 4
    merge 1 with 5
    merge 1 with 6
    merge 1 with 7

Now we'll do:

    merge 1 with 1    (4 times)
    merge 2 with 2    (2 times)
    merge 4 with 4    (once)


=== Products/ZCTextIndex/QueryEngine.py 1.1.2.3 => 1.1.2.4 ===
 
 from BTrees.IIBTree import difference, weightedIntersection, weightedUnion
+from Products.ZCTextIndex.NBest import NBest
 
 class QueryError(Exception):
     pass
@@ -46,16 +47,25 @@
                     dummy, notset = weightedUnion(notset, x)
                 set = difference(set, notset)
             return set
+
         if kind == "OR":
-            L = []
-            for subtree in tree.getValue():
-                L.append(self.executeQuery(index, subtree))
-            L.sort(lambda x, y: cmp(len(x), len(y)))
-            set = L[0]
-            for x in L[1:]:
-                dummy, set = weightedUnion(set, x)
-            return set
+            # Balance unions as closely as possible, smallest to largest.
+            allofem = tree.getValue()
+            merge = NBest(len(allofem))
+            for subtree in allofem:
+                result = self.executeQuery(index, subtree)
+                merge.add(result, len(result))
+            while len(merge) > 1:
+                # Merge the two smallest so far, and add back to the queue.
+                x, dummy = merge.pop_smallest()
+                y, dummy = merge.pop_smallest()
+                dummy, z = weightedUnion(x, y)
+                merge.add(z, len(z))
+            result, dummy = merge.pop_smallest()
+            return result
+
         if kind == "ATOM":
             return index.search(tree.getValue())
+
         if kind == "NOT":
             raise QueryError, "NOT operator must occur right after AND"