[Zope-dev] International Catalog

Michel Pelletier michel@digicool.com
Tue, 14 Sep 1999 11:36:31 -0400


I cc'ed some other people on this, because there is some important
locale information in this message.


> -----Original Message-----
> From: Martijn Faassen [mailto:m.faassen@vet.uu.nl]
> Sent: Tuesday, September 14, 1999 7:33 AM
> Cc: zope-dev@zope.org
> Subject: Re: [Zope-dev] Re: [Zope] Need a list of words not indexed by
> Catalog
> 
> Rik Hoekstra wrote:
> > 
> > > Terrel Shumway wrote:
> > > >
> > > > near the end of
> > > >         lib/python/SearchIndex/TextIndex.py
> > > > is a list called 'stop_words'
> > > >
> > > > [Zope Dev] It would be good to move this out of the .py 
> file into an
> > > > editable, internationalizable resource file.
> > >
> > > Agreed! And then there's the *multi* lingual issue too. 
> What if I have
> > > Dutch and English on my site?
> [snip]
> > It seems like you run into a _lot_ of complexities with 
> multilingual issues,
> > and still these are real issues for many of us.
> 
> Yes, very real issues. Suddenly ZCatalog isn't the 
> almost-ready tool to
> add searchability to the website I'm building anymore.. Now I 
> need to do
> quite a bit of extra work, I imagine..

I am thinking heavily about this very problem as we speak.  You all
correctly pointed out some of the toughest of the problems.  Here are my
ideas so far:

Have 'vocabulary objects' store the stopwords, synonyms, stemming rules,
and lexicon (collection of uniquely indexed words) in a drop-in object
for ZCatalog.  This way, a 'French', 'Dutch' etc. vocabulary object
could be developed by a third party.

TextIndexes can then reference (or acquire) a vocabular object through
which it can stop, syn, stem and store words in it's lexicon.  There are
many other issues like sharing lexicons between similar language
indexes, and having multiple back-end 'index/vocabularies' that all look
like one index, so you can search a 'document source' for either
'community' or 'communauté' or 'Gemeinschaft' and get only documents
relevant to that language (my applogies if these words are wrong, I'm
using babelfish).  I think this problem could be intractable though, if
you searched for 'walking' in english, the word would stem down into
'walk', if you search for 'marche' en francais, should it stem down to
'promenade'?

Anyways, there is some good news.  For those of you tracking CVS we have
added the ability to set your locale in Zope.  This means that,
forexample, the splitter/stemmer in the catalog will recognize all of
those umlauts and accented letters and whatnots that english doesn't
have.  We would like a few people all over the place to try this out.
If your locale has a different language or monetary system than the US
(just about everywhere except some of canada) this might make the
catalog and other parts of Zope more useful for you.

local can be activated from the z2.py command like with the '-L' option.
"-L ''" (an empty string) will cause local to try and autodetect your
locale from your environment variables (you must set the env variables
yourself, see 'man 7 locale').  Alternativly, you can say "-L de" and
set your local to German.

Please folks, test this out for us.  We don't really have the means to
do it here.

-Michel