[Zope3-dev] i18n, unicode, and the underline

Shane Hathaway shane@zope.com
Mon, 14 Apr 2003 11:50:33 -0400


Martijn Faassen wrote:
> Shane Hathaway wrote:
> 
>>May I suggest that while Python's Unicode support is transitional, all 
>>methods and functions that expect to manipulate Unicode should convert 
>>strings to Unicode at runtime?  Not all functions would have to do this, 
>>only those that concatenate strings (I think).
> 
> 
> Hm, I think that this is a bad idea:
> 
>   * how can I convert strings to unicode if I don't know what encoding the
>     string is in?
> 
>   * why pay for this performance and code complexity impact?
> 
> If your framework properly uses unicode, this is endless overhead on the
> programmer for no good reason..

That's what I thought when I made the transition from C++ to Java.  I 
was pretty skeptical, but here's what I figured out:

- The source file should be written in 7-bit ASCII, so the default 
encoding doesn't matter.  (I *think* that's the story.)

- The code doesn't increase in complexity as long as the core functions 
accept either strings or Unicode.  Except when you're doing I/O or C 
extensions, you can forget that you're using Unicode at all.  If you 
have to add Unicode later for I18N, you'll pay a much higher price in 
complexity.

- The only real difference between 8-bit character strings and 32-bit 
character strings is 24 bits per character. :-)  Modern processors deal 
with either kind of string with virtually equal speed.  The only cost is 
in conversion between the formats, and if your program is typical, the 
conversion only needs to happen when it's communicating with the outside 
world.

Shane