[Zope-dev] Modifying Splitter.c to search on '+' & '#', and single letter words

Harry Wilkinson harryw@nipltd.com
Thu, 26 Jul 2001 10:38:55 +0100


This seems to work perfectly, thanks a lot :D

I am pretty sure I'm not using a globbing vocabulary, I've tried deleting the
test ZCatalog I was using and creating a new one and using the vocabulary it
gives me.  Is it meant to use GlobbingLexicon.py for all vocabularies?

Well thanks again :)

Harry

Michel Pelletier wrote:

> Harry Wilkinson wrote:
> >
> > I have two problems with getting ZCatalog to search for what I need:
> >
> > 1) Need to be able to search for words like 'J++' and 'C#'
> >        - this is relatively simple to do by editing Splitter.c a little
> > and recompiling
> > 2) Need to be able to search for single-letter words like 'C'
> >        - this is easy to modify Splitter.c to accomodate, but causes
> > errors in GlobbingLexicon.py, even though the vocabulary is standard
> >
> > So far I have solved problem (1) by changing the contents of Splitter.c,
> > but that's a bit messy.  Currently I don't know of an alternative
> > though.
> >
> > I have modified Splitter.c so it indexes the extra characters, and
> > reduced the mimimum word length to 1, which works fine when indexing,
> > and I can see all the symbol-inclusive words and single-letter words in
> > the vocabulary.  Unfortunately, any search on a single-letter word gives
> > an IndexError, "String out of range".
>
> This is because the globbinglexicon never anticipated single letter
> patterns.  This is a bug.  Try this (untested) quick patch:
>
> Index: GlobbingLexicon.py
> ===================================================================
> RCS file:
> /cvs-repository/Zope2/lib/python/SearchIndex/GlobbingLexicon.py,v
> retrieving revision 1.9
> diff -c -r1.9 GlobbingLexicon.py
> *** GlobbingLexicon.py  2001/04/02 18:19:45     1.9
> --- GlobbingLexicon.py  2001/07/26 05:21:48
> ***************
> *** 221,226 ****
> --- 221,229 ----
>
>               if i == 0:
>                   digrams.insert(i, (self.eow + pattern[i]) )
> +                 if len(pattern) == 1:
> +                     digrams.append( (pattern[i] + self.eow) )
> +                     break
>                   digrams.append((pattern[i] + pattern[i+1]))
>               else:
>                   try:
>
> > I am stuck on problem (2) and don't know how to avoid the errors arising
> > in GlobbingLexicon.py without editing in some kind of hack to get around
> > it.
>
> That's exactly what this patch does.
>
> > I don't even know why GlobbingLexicon is getting involved in the
> > search process since I am not trying to use wildcards and haven't
> > elected to use a globbing vocabulary (AFAIK).
>
> You must have somehow, GlobbingLexicon is never the default.
>
> -Michel
>
> _______________________________________________
> Zope-Dev maillist  -  Zope-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zope-dev
> **  No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope )