[Zope3-dev] i18n, unicode, and the underline
Shane Hathaway
shane@zope.com
Mon, 14 Apr 2003 11:50:33 -0400
Martijn Faassen wrote:
> Shane Hathaway wrote:
>
>>May I suggest that while Python's Unicode support is transitional, all
>>methods and functions that expect to manipulate Unicode should convert
>>strings to Unicode at runtime? Not all functions would have to do this,
>>only those that concatenate strings (I think).
>
>
> Hm, I think that this is a bad idea:
>
> * how can I convert strings to unicode if I don't know what encoding the
> string is in?
>
> * why pay for this performance and code complexity impact?
>
> If your framework properly uses unicode, this is endless overhead on the
> programmer for no good reason..
That's what I thought when I made the transition from C++ to Java. I
was pretty skeptical, but here's what I figured out:
- The source file should be written in 7-bit ASCII, so the default
encoding doesn't matter. (I *think* that's the story.)
- The code doesn't increase in complexity as long as the core functions
accept either strings or Unicode. Except when you're doing I/O or C
extensions, you can forget that you're using Unicode at all. If you
have to add Unicode later for I18N, you'll pay a much higher price in
complexity.
- The only real difference between 8-bit character strings and 32-bit
character strings is 24 bits per character. :-) Modern processors deal
with either kind of string with virtually equal speed. The only cost is
in conversion between the formats, and if your program is typical, the
conversion only needs to happen when it's communicating with the outside
world.
Shane