[Zope] Sorting is broken with UTF-8?

Daniel Dekany ddekany at freemail.hu
Thu Apr 14 09:25:50 EDT 2005


Thursday, April 14, 2005, 12:35:37 PM, Andreas Jung wrote:

> --On Donnerstag, 14. April 2005 12:20 Uhr +0200 Daniel Dekany
> <ddekany at freemail.hu> wrote:
>
>> I have a Zope 2.7.0(+Plone) instance that uses utf-8 encoding
>> everywhere. The problem is that alphabetical sorting (like with
>> DocumentTemplate.sequence.sort(seq, 'locale', ...)) is broken
>> everywhere: accented letters come after all US-ASCII characters. I have
>> locale=hu_HU.UTF-8 in zope.conf, still it seems that the collation
>> algorithm can't handle UTF-8 encoded strings correctly, and since 0x80
>> is higher than the code of the US-ASCII characters, a character that is
>> out of the US-ASCII range will be later than the US-ASCII ones. Actually
>> Python can't sort UTF-8 with strcoll either (at least I couldn't achieve
>> that), I guess the root of the problem is there.
>
> Right. This is not a Zope problem so better ask the Python world or file a
> Python bug report.

I see, but then my question is: How do people use Zope for sites where
"Unicode" is needed? They just don't use Zope in such cases? At my new
employer here is fat Plone site running for months with the mentioned
sorting disorders. I don't know why my predecessor has made it with
UTF-8 if it is not supported. And if it is really not supported, then I
hope there is some utility by which I can convert the charset of a whole
Zope database... is there?

>> So, what should I do now? UTF-8 charset doesn't work in reality with
>> Zope so I should forget it and switch to ISO-8859-x?
>
> sequence.sort() accepts also custom comparison methods. So you could write
> your own method *somehow*.

That would be OK for me if that works. The problem is that sorting
mostly happens in 3rd party products, and they will call sequence.sort
with 'locale' and 'locale_nocase' and such, and not with my custom
comparison function. OK, I could then patch the sequence.sort of Zope,
so it is UTF-8 aware even with 'locale' and with 'locale_nocase'. But
still not good, because there will be places where Python's
locale.strcoll is used, and worst maybe both sequence.sort and
locale.strcoll is used regarding the same sequence on different places,
and then there will be inconsistencies. So after all I should patch
Python, which is really out of my competence. But I don't know, I'm
totally new to Python and Zope (I'm primarily a Java guy)... so do I
miss something?

> -aj

-- 
Best regards,
 Daniel Dekany



More information about the Zope mailing list