[Zope-dev] Modifying Splitter.c to search on '+' & '#', and single letter words

Michel Pelletier michel@digicool.com
Wed, 25 Jul 2001 22:25:53 -0700


Harry Wilkinson wrote:
> 
> I have two problems with getting ZCatalog to search for what I need:
> 
> 1) Need to be able to search for words like 'J++' and 'C#'
>        - this is relatively simple to do by editing Splitter.c a little
> and recompiling
> 2) Need to be able to search for single-letter words like 'C'
>        - this is easy to modify Splitter.c to accomodate, but causes
> errors in GlobbingLexicon.py, even though the vocabulary is standard
> 
> So far I have solved problem (1) by changing the contents of Splitter.c,
> but that's a bit messy.  Currently I don't know of an alternative
> though.
> 
> I have modified Splitter.c so it indexes the extra characters, and
> reduced the mimimum word length to 1, which works fine when indexing,
> and I can see all the symbol-inclusive words and single-letter words in
> the vocabulary.  Unfortunately, any search on a single-letter word gives
> an IndexError, "String out of range".

This is because the globbinglexicon never anticipated single letter
patterns.  This is a bug.  Try this (untested) quick patch:

Index: GlobbingLexicon.py
===================================================================
RCS file:
/cvs-repository/Zope2/lib/python/SearchIndex/GlobbingLexicon.py,v
retrieving revision 1.9
diff -c -r1.9 GlobbingLexicon.py
*** GlobbingLexicon.py	2001/04/02 18:19:45	1.9
--- GlobbingLexicon.py	2001/07/26 05:21:48
***************
*** 221,226 ****
--- 221,229 ----
  
              if i == 0:
                  digrams.insert(i, (self.eow + pattern[i]) )
+                 if len(pattern) == 1:
+                     digrams.append( (pattern[i] + self.eow) )
+                     break
                  digrams.append((pattern[i] + pattern[i+1]))
              else:
                  try:


> I am stuck on problem (2) and don't know how to avoid the errors arising
> in GlobbingLexicon.py without editing in some kind of hack to get around
> it. 

That's exactly what this patch does.

> I don't even know why GlobbingLexicon is getting involved in the
> search process since I am not trying to use wildcards and haven't
> elected to use a globbing vocabulary (AFAIK).

You must have somehow, GlobbingLexicon is never the default.

-Michel