[Zope-dev] ZCatalog with UTF-8 Chinese

Sin Hang Kin kentsin@poboxes.com
Thu, 28 Sep 2000 08:08:41 +0800


Dear Developer:

Trying to short-cut UNTEXTINDEX to handle UTF-8 Chinese, I need some help.

After reading some code of query, I think the regular expression operations
which in parse, quotes and parse2 were not safe for utf8 string. So, I
decide to emulate what they do. However, I do not understand what getlexicon
is doing and I would like to learn what  q should looks like before it is
passed to evaluate. I do not understand that vocabulary seems to store like
integer, is getlexicon a step to look up the string to convert them to
integer? I am getting lost.

Could some experienced developer help me out of these?

Rgs,

Kent Sin
---------------------------------
kentsin.weblogs.com
kentsin.imeme.net


def query(self, s, default_operator = Or, ws = (string.whitespace,)):

"""

This is called by TextIndexes. A 'query term' which is a string

's' is passed in, along with an index object. s is parsed, then

the wildcards are parsed, then something is parsed again, then the

whole thing is 'evaluated'

"""

# First replace any occurences of " and not " with " andnot "

s = ts_regex.gsub('[%s]+and[%s]*not[%s]+' % (ws * 3), ' andnot ', s)

# do some parsing

q = parse(s)

## here, we give lexicons a chance to transform the query.

## For example, substitute wildcards, or translate words into

## various languages.

q = self.getLexicon(self._lexicon).query_hook(q)

# do some more parsing

q = parse2(q, default_operator)

## evalute the final 'expression'

return self.evaluate(q)