[Zope] ZCatalog and foreign characters

Dieter Maurer dieter@handshake.de
Tue, 29 Aug 2000 23:57:08 +0200 (CEST)


Radim Gelner writes:
 > is it possible to make ZCatalog work correctly with words containing
 > characters other then those given in ISO-8859-1.
 > 
 > Now, it reports "no found" for all such queries even when these words
 > are present inside the documents on site.
I have made a very crude patch to "splitter.c" which lets it
treat every non-ascii character as a letter.

Obviously, this is not correct. It may include punctution into
words. This will lead to not find the words unless searched for
with the exact same punctuation.
Furthermore, non-ascii letters are not translated to lowercase.

Up to now, it gives acceptable results for us.
However, we did not yet make stress tests.

For a correct solution, splitter must be informed about
the encoding and a unicode letter classification and
case transformation must be applied.


If you work with a fixed locale, you can use the "-L" switch
to inform Zope about your locale. Then the splitter should
work correctly (for your locale).



Dieter