[Zope-Dev] Some thoughts on splitter (Sin Hang Kin)

Sin Hang Kin kentsin@poboxes.com
Mon, 17 Apr 2000 12:19:25 +0800


----- Original Message -----
From: <zope-dev-admin@zope.org>
To: <zope-dev@zope.org>
Sent: Monday, April 17, 2000 3:00 AM
Subject: Zope-Dev digest, Vol 1 #474 - 8 msgs


>
> Send Zope-Dev maillist submissions to
> zope-dev@zope.org
>
> To subscribe or unsubscribe via the web, visit
> http://lists.zope.org/mailman/listinfo/zope-dev
> or, via email, send a message with subject or body 'help' to
> zope-dev-request@zope.org
> You can reach the person managing the list at
> zope-dev-admin@zope.org
>
> (When replying, please edit your Subject line so it is more specific than
> "Re: Contents of Zope-Dev digest...")
>
>
> I mean portability across other objects that may want to 'use' the
> document object.  If the object gets invisibly transformed, and other
> objects don't expect this, things will break.  Also, unless the user
> specificly wants their text to be transformed they many be
> suprised/angered that their text was normalized to unicode.

There were two things: 1. insert of the non-joiner to mark the break point
of words.
2. The normalize process.

Step 1 will really change the document. But it is still not what zcatalog is
doing. It is up to the content manager to decide to make that or not. If he
decide to do so, he should prepare the content as required. Or make a
pre-processor to do it. Only the splitter recognize what the non-joiner as a
break point of the word. It is just like spliter recognize space and tab
were word break point. Not zope make any decision that nobody wants.

Step 2. is performed on making the index, just as you would do to capital
the index terms. Not thing change the original content, just when zcatalog
make the index, it convert the various encoding to unicode, make
normalization, and optionally do more changes like stemming, sym combination
etc. But all these will not change the content.

Rgs,

Kent Sin