[Zope] [REQ] Support for multi-lingual components of TextIndexNG wanted

Andreas Jung andreas@andreas-jung.com
Mon, 17 Jun 2002 08:51:26 -0400


Hi folks,

the next version of TextIndexNG will focus on multi-lingual issues
(and has full unicode support).

I need some support from the community for components 
that are language-dependent:

- stopwords
 
  Stopwords are words that are removed during the indexing 
  process because they are very common e.g. 'a', 'the', 'for' 
  in English

- normalization

  Normalization means the translation of special characters
  or a sequence of characters to a more simpler form, e.g.
  'Ä' -> 'Ae',  'ä' -> 'ae', ´ß' -> 'ss' or a more radical
  reduction like 'Ä' -> 'A',  'ä' -> 'a', ´ß' -> 's'.
  Such a reduction allows more fault tolerant searching.

At the moment TextIndexNG supports only German and English.
If you like to see more languages supported by TextIndexNG,
feel free to contribute lists with stopwords of your language
and/or translation rules for the normalization step.

Thanks,
Andreas