[Zope-dev] ZCatalog cannot support chinese?

Brian Takashi Hooper brian@garage.co.jp
Fri, 25 Feb 2000 12:19:55 +0900


Hi Victor,

I made a Splitter for Japanese, that also turns Splitter into an
ExtensionClass that can be used to subclass Splitters for other
languages.

Once you can split Chinese documents into separate words, then ZCatalog
can pretty much handle the rest on its own.

I would be happy to help you create Chinese support, since I'm pretty
interested in that myself - what I'll need to have some help from you to
find is something to help do lexical analysis of Chinese text,
preferably something free.  For Japanese I used ChaSen, which is a
Japanese text analysis library from the Nara Institute of Technology - I
can feed it a Japanese document and it checks a dictionary and comes
back and tells me what all the words are, and what part of speech each
word is, etc. The ChaSen home page is all in Japanese, so I probably
couldn't have found it without a Japanese-capable environment (and
Japanese language skills)...

Could you try to find a similar library for Chinese?

If so, I will help you make it work for searching...

--Brian Hooper

On Thu, 24 Feb 2000 18:14:57 -0800
Michel Pelletier <michel@digicool.com> wrote:

> 
> 
> Victor.Zhai@ogilvy.com wrote:
> > 
> > Hi,all
> >    In my project, I want to use ZCatalog to build up a search interface!
> > But It doesnot support Chinese. Can some one give me some advice on it.
> 
> ZCatalog does not currently support Chinese for several reasons:
> 
>  1) I've never seen or worked with Chinese, and I have no environment to
> debug it.
> 
>  2) Python itself is still working on complete internationalization
> 
>  3) ZCatalog is very english-centric
> 
> However, I am working on several enhancements to ZCatalog which will
> help you here.  First, ZCatalog now supports the notion of
> Vocabularies.  Vocabularies are seperate objects from ZCatalogs. 
> Vocabularies seperate all of the language specific features from
> ZCatalog.  Therefore, if you subclass and create your own kind of
> Vocabulary (say, ChineseVocabulary), you can:
> 
>   1) create your own kind of 'Splitter', which is the object that splits
> documents into words.  Currently Zope's splitter is very simply and only
> understands english (and some european) languages how to split words on
> spaces.  Splitting chinese probably requires a much different algorithm.
> 
>   2) control stop words and synonyms, right now, Zope has hard-coded
> stopwords that are english only, and no synonym support.  In 2.2, Zope
> Vocabularies will allow you to control these stopwords and synonyms in a
> language neutral fashion.
> 
> There features are in the current CVS but they are still quite raw. 
> What would help is the currently unreleased ZCatalog User's Guide, the
> latest version of which is currently on a Zip disk packed in a box
> somehere here in my apartment.  I should really dig that up.
> 
> But for chinese support, you're going to have to roll up your sleeves a
> little and subclass your own kind of Vocabulary object.  This is not
> really so hard to do, it's just hard to understand without
> documentation.
> 
> -Michel
> 
> _______________________________________________
> Zope-Dev maillist  -  Zope-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zope-dev
> **  No cross posts or HTML encoding!  **
> (Related lists - 
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope )
> 
>