[Zope-CVS] CVS: Products/ZCTextIndex - QueryParser.py:1.7

Guido van Rossum guido@python.org
Mon, 20 May 2002 12:03:56 -0400


Update of /cvs-repository/Products/ZCTextIndex
In directory cvs.zope.org:/tmp/cvs-serv10536

Modified Files:
	QueryParser.py 
Log Message:
QueryParser.py:

- Rephrased the description of the grammar, pointing out that the
  lexicon decides on globbing syntax.

- Refactored term and atom parsing (moving atom parsing into a
  separate method).  The previously checked-in version accidentally
  accepted some invalid forms like ``foo AND -bar''; this is fixed.

tests/testQueryParser.py:

- Each test is now in a separate method; this produces more output
  (alas) but makes pinpointing the errors much simpler.

- Added some tests catching ``foo AND -bar'' and similar.

- Added an explicit test class for the handling of stopwords.  The
  "and/" test no longer has to check self.__class__.

- Some refactoring of the TestQueryParser class; the utility methods
  are now in a base class TestQueryParserBase, in a different order;
  compareParseTrees() now shows the parse tree it got when raising an
  exception.  The parser is now self.parser instead of self.p (see
  below).

tests/testZCTextIndex.py:

- setUp() no longer needs to assign to self.p; the parser is
  consistently called self.parser now.



=== Products/ZCTextIndex/QueryParser.py 1.6 => 1.7 ===
 
 + A sequence of characters not containing whitespace or parentheses or
-  double quotes, and not equal to one of the key words 'AND', 'OR', 'NOT'; or
+  double quotes, and not equal (ignoring case) to one of the key words
+  'AND', 'OR', 'NOT'; or
 
-+ A non-empty string enclosed in double quotes.  The interior of the string
-  can contain whitespace, parentheses and key words.
-
-In addition, an ATOM may optionally be preceded by a hyphen, meaning
-that it must not be present.
-
-An unquoted ATOM may also end in a star.  This is a primitive
-"globbing" function, meaning to search for any word with a given
-prefix.
++ A non-empty string enclosed in double quotes.  The interior of the
+  string can contain whitespace, parentheses and key words, but not
+  quotes.
+
++ A hyphen followed by one of the two forms above, meaning that it
+  must not be present.
+
+An unquoted ATOM may also contain globbing characters.  Globbing
+syntax is defined by the lexicon; for example "foo*" could mean any
+word starting with "foo".
 
 When multiple consecutive ATOMs are found at the leaf level, they are
 connected by an implied AND operator, and an unquoted leading hyphen
@@ -202,32 +204,37 @@
             tree = self._parseOrExpr()
             self._require(_RPAREN)
         else:
-            atoms = [self._get(_ATOM)]
-            while self._peek(_ATOM):
-                atoms.append(self._get(_ATOM))
             nodes = []
-            nots = []
-            for a in atoms:
-                words = self._lexicon.parseTerms(a)
-                if not words:
-                    self._ignored.append(a)
-                    continue # Only stopwords
-                if len(words) > 1:
-                    n = ParseTree.PhraseNode(" ".join(words))
-                elif self._lexicon.isGlob(words[0]):
-                    n = ParseTree.GlobNode(words[0])
-                else:
-                    n = ParseTree.AtomNode(words[0])
-                if a[0] == "-":
-                    n = ParseTree.NotNode(n)
-                    nots.append(n)
-                else:
-                    nodes.append(n)
+            nodes = [self._parseAtom()]
+            while self._peek(_ATOM):
+                nodes.append(self._parseAtom())
+            nodes = filter(None, nodes)
             if not nodes:
-                return None # Only stowords
-            nodes.extend(nots)
+                return None # Only stopwords
+            structure = [(isinstance(nodes[i], ParseTree.NotNode), i, nodes[i])
+                         for i in range(len(nodes))]
+            structure.sort()
+            nodes = [node for (bit, index, node) in structure]
+            if isinstance(nodes[0], ParseTree.NotNode):
+                raise ParseTree.ParseError(
+                    "a term must have at least one positive word")
             if len(nodes) == 1:
-                tree = nodes[0]
-            else:
-                tree = ParseTree.AndNode(nodes)
+                return nodes[0]
+            tree = ParseTree.AndNode(nodes)
+        return tree
+
+    def _parseAtom(self):
+        term = self._get(_ATOM)
+        words = self._lexicon.parseTerms(term)
+        if not words:
+            self._ignored.append(term)
+            return None
+        if len(words) > 1:
+            tree = ParseTree.PhraseNode(words)
+        elif self._lexicon.isGlob(words[0]):
+            tree = ParseTree.GlobNode(words[0])
+        else:
+            tree = ParseTree.AtomNode(words[0])
+        if term[0] == "-":
+            tree = ParseTree.NotNode(tree)
         return tree