[Zope] Zope 2.6.1 and UTF-8

Mark Barratt markb at textmatters.com
Wed Sep 10 17:56:54 EDT 2003


Most of this is discussion is over my head. But there's one pretty basic
misunderstanding exhibited:

>>>> I've got some stuff that's in strings, so I guess not unicode, but
>>>> which is UTF-8 encoded, and I'm wondering how I make sure Zope does
>>>> "the
>>>> right thing" here. Are there any docs about?

and

>> Hmmm, that's interesting. I'd been planning on keeping everything as
>> UTF-8 encoded strings rather than actual unicode. What leads you to
>> suggest storing everything as unicode?

and

>> Finally, is ZCTextIndex compatible with either unicode or strings that
>> contain UTF-8 encoding?

UTF-8 is one way of encoding Unicode character-sets. They are not
different things. When you use UTF-8 you are using Unicode.

UTF-8 exists to allow systems to migrate gently, because it translates a
very large character set into a format that will not normally break file
systems which expect 8-bit character data. There are 16-bit and 32-bit
representations of Unicode.

UTF-8's representation of ASCII is identical to ASCII's. So for
applications which internally process only ASCII, the encoding is moot.
But if you have user input you need to watch out: Windows and MacOS
support UTF-8 input in browser windows for forms input. This input can
seriously break old apps.

best

Mark Barratt



More information about the Zope mailing list