[Zope-CVS] CVS: Products/ZCTextIndex/tests - testIndex.py: testQueryParser.py:

Guido van Rossum guido@python.org
Fri, 10 May 2002 21:00:36 -0400

Update of /cvs-repository/Products/ZCTextIndex/tests
In directory cvs.zope.org:/tmp/cvs-serv16638/tests

Modified Files:
      Tag: TextIndexDS9-branch
	testIndex.py testQueryParser.py 
Log Message:
Add primitive phrase search.

This uses a "compression" scheme for lists of ints that I dubbed
"WidCode", and which uses an encoding somewhat similar to UTF-8.  It
only saves about 20 percent in pickle size over the (binary) pickle of
the list, but has the special property that you can use the string's
find() method to verify if a phrase occurs in a document.  Because
docwords now records *all* wids, in document order, rather than only a
list of unique wids, the WidCode-encoded list is probably *longer*
than what was stored in docwords before, but i hope it's not that much
longer.  The performance of WidCode could be improved by doing a
pre-scan over part of the corpus and assigning wids by frequency of
occurrence (the most frequent word gets wid 1, and so on).

Also still to do: change the query parser to recognize "words in
quotes" for phrase search.  It currently takes any sequence of words
without operators as a phrase search, i.e. the default operator is now
phrase search; but that's probably not what you want!  I did add a new
parse tree node, PhraseNode, which encodes a phrase search; there's
also a new Index method search_phrase().

=== Products/ZCTextIndex/tests/testIndex.py => ===
         self.assertEqual(len(self.index._wordinfo), 5)
         self.assertEqual(len(self.index._docwords), 1)
-        self.assertEqual(len(self.index._get_undoinfo(DOCID)), 5)
+##        self.assertEqual(len(self.index._get_undoinfo(DOCID)), 5)
         wids = self.lexicon.termToWordIds("repeat")
         self.assertEqual(len(wids), 1)
         repititive_wid = wids[0]

=== Products/ZCTextIndex/tests/testQueryParser.py => ===
 from Products.ZCTextIndex.ParseTree import \
-     ParseError, ParseTreeNode, OrNode, AndNode, NotNode, AtomNode
+     ParseError, ParseTreeNode, OrNode, AndNode, NotNode, AtomNode, PhraseNode
 class TestQueryParser(TestCase):
     def compareParseTrees(self, got, expected):
         self.assertEqual(isinstance(got, ParseTreeNode), 1)
         self.assertEqual(got.__class__, expected.__class__)
-        if isinstance(got, AtomNode):
+        if isinstance(got, PhraseNode):
+            self.assertEqual(got.nodeType(), "PHRASE")
+            self.assertEqual(got.getValue(), expected.getValue())
+        elif isinstance(got, AtomNode):
             self.assertEqual(got.nodeType(), "ATOM")
             self.assertEqual(got.getValue(), expected.getValue())
         elif isinstance(got, NotNode):
@@ -65,9 +68,9 @@
         self.expect("a AND not b",
                     AndNode([AtomNode("a"), NotNode(AtomNode("b"))]))
-        self.expect("foo bar", OrNode([AtomNode("foo"), AtomNode("bar")]))
+        self.expect("foo bar", PhraseNode("foo bar"))
-        self.expect("((foo bar))", OrNode([AtomNode("foo"), AtomNode("bar")]))
+        self.expect("((foo bar))", PhraseNode("foo bar"))
     def testParseFailures(self):