[Zope-dev] [Petition] Kludge for Splitter.c (long)

Michel Pelletier michel@digicool.com
Mon, 17 Jan 2000 16:29:53 -0500


> -----Original Message-----
> From: LEE, Kwan Soo [mailto:kslee@plaza1.snu.ac.kr]


> 1. How about a dumb kludge Splitter.c which treats the 
> characters in the user-specifiable/configuable list as white 
> space and all other characters upto char(255) as meaningfull 
> character and splits the text.

This could fix your problem, but won't work for multi-byte char strings.

> In Korean, the current approach based on 'stem' words and 
> 'stop' words will simple not work. For we have quite 
> different writing convention. I guess many other (small) 
> languages have simillar problems. Still, Full Text Search 
> capabilities are so valuable to live without it. 

Currently, the stemming and stopping of works in ZCatalog is English
language dependent.

> Furthermore, what if a Zope site contains documents in many 
> languages? I guess the approach based on _ONE_ locale will 
> not work greatly. Does one need several personalities of Splitter?

Possibly, or a new approach to the whole problem.
 
> Before the "Full I18N/Localization Support"(I'm not sure what 
> that mean ...) of Python & ZOPE,  a (maybe unsupported or 

I18N means 'internationalization':  'I' followed by 18 chars followed
by 'N'.

> community supported) kludge Splitter module with adequate 
> warning may relieve the lives of lots of 
> none-English/European Language Zopistas. 
> 
> 2. Can any one eplain(or give the clue of) the difference of 
> SearchIndex/ZCatalog i Zope 2.0.x and 2.1.x? Especially the 
> role of subindex in TextIndex.py and UnTextIndex.py? My 
> Splitter.py gets errors whenever subindex is related.

If your splitter works identically to the one that comes with Zope
there should be no problem.

-Michel