[Zope] patch: "FullTextIndex" fix for ZCatalog weirdness

Michael Halle halazar@media.mit.edu
Tue, 21 Sep 1999 02:16:14 -0400


The fact that ZCatalog currently does unexpected things with query
words that are "stop words" was a show-stopper for my application.
While discussion on the mailing list included possible big-picture
fixes, I chose a simple quick solution to the problem.  I added a new
Index type, "FullTextIndex," to augment the existing "TextIndex" and
"FieldIndex".

Using "FullTextIndex", the stop word dictionary is set to {}, forcing
a full text catalog to be built.  While noise words will cause inflated
indices and spurious searches, at least the results are intuitive.

"FullTextIndex" makes sense for titles and keyword fields;  it doesn't
make sense for long documents.  

Diffs from Zope-2.0.0 follow.  The patch also includes the fix to
sort-order posted to this list.  Apply with patch -p0 from the top level
Zope directory.

Remember kids, this isn't an official patch!


Michael Halle
mhalle@media.mit.edu
-------------------------------------------------------------------------------

*** lib/python/Products/ZCatalog/ZCatalog.py.~1~	Wed Sep  1 14:28:58 1999
--- lib/python/Products/ZCatalog/ZCatalog.py	Tue Sep 21 01:05:46 1999
***************
*** 195,201 ****
          self._catalog.addIndex('id', 'FieldIndex')
  
          self._catalog.addColumn('title')
!         self._catalog.addIndex('title', 'TextIndex')
  
          self._catalog.addColumn('meta_type')
          self._catalog.addIndex('meta_type', 'FieldIndex')
--- 195,201 ----
          self._catalog.addIndex('id', 'FieldIndex')
  
          self._catalog.addColumn('title')
!         self._catalog.addIndex('title', 'FullTextIndex')
  
          self._catalog.addColumn('meta_type')
          self._catalog.addIndex('meta_type', 'FieldIndex')
*** lib/python/Products/ZCatalog/Catalog.py.~1~	Wed Sep  1 11:40:24 1999
--- lib/python/Products/ZCatalog/Catalog.py	Tue Sep 21 01:09:35 1999
***************
*** 253,258 ****
--- 253,260 ----
              indexes[name] = UnIndex.UnIndex(name)
          elif type == 'TextIndex':
              indexes[name] = UnTextIndex.UnTextIndex(name)
+         elif type == 'FullTextIndex':
+             indexes[name] = UnTextIndex.UnTextIndex(name, stop_word_dict={})
  
          self.indexes = indexes
  
***************
*** 408,414 ****
                  rs=data.items()
                  append(LazyMap(self.instantiate, rs))
              else:
!                 for k, intset in sort_index.items():
                      append((k,LazyMap(self.__getitem__, intset)))
          elif rs:
              if sort_index is None:
--- 410,416 ----
                  rs=data.items()
                  append(LazyMap(self.instantiate, rs))
              else:
!                 for k, intset in sort_index._index.items():
                      append((k,LazyMap(self.__getitem__, intset)))
          elif rs:
              if sort_index is None:
*** lib/python/Products/ZCatalog/catalogIndexes.dtml.~1~	Thu Aug 26 10:20:43 1999
--- lib/python/Products/ZCatalog/catalogIndexes.dtml	Tue Sep 21 00:56:43 1999
***************
*** 29,34 ****
--- 29,35 ----
  
  of Index Type: <select name="type">
           <option value="TextIndex">TextIndex</option>
+          <option value="FullTextIndex">FullTextIndex</option>
           <option value="FieldIndex">FieldIndex</options>
           </select>
  <input name="manage_addIndex:method" type=submit value=" Add ">
106c106,107
<     def __init__(self, id=None, ignore_ex=None, call_methods=None):
---
>     def __init__(self, id=None, ignore_ex=None, call_methods=None,
>                  stop_word_dict=None):
119a121,122
>           'stop_word_dict' -- An dictionary of stop words.
> 
121c124
<         if not id==ignore_ex==call_methods==None:
---
>         if not my_stop_word_dict==id==ignore_ex==call_methods==None:
127,128c130,133
<             self._syn=stop_word_dict
< 
---
>             if my_stop_word_dict is None:
>                 self._syn=default_stop_word_dict
>             else:
>                 self._syn=stop_word_dict
132d136
< 
630,631c634,635
< stop_word_dict={}
< for word in stop_words: stop_word_dict[word]=None
---
> default_stop_word_dict={}
> for word in stop_words: default_stop_word_dict[word]=None