[Zope3-dev] i18n, unicode, and the underline

Mon, 14 Apr 2003 09:56:03 -0400

Martijn Faassen wrote:
> Guido van Rossum wrote:
>>I've heard this before, and it seems to be the main reason why Zope 3
>>has chosen an (IMO unnecessary) anal attitude towards Unicode.  But
>>nobody has ever been able to show me what the problems were.  (I'm not
>>doubting there were problems; I just need more details to understand
>>what was the matter.)
> 
> 
> I doubt the problem is really big with literals, as it's fairly easy to
> make sure they're ascii only. It does exist with user input, which can
> be anything, for instance latin-1. The problem occurs when user start
> mixing (say) latin-1 with unicode. This seems to work in tests, until suddenly
> a user enters some non-ascii character, and then suddenly code starts to
> give unicode errors in locations that are not always easy to figure out.
> 
> This is similar to how the use of one floating point number in an integer
> calculation, except that it's worse as the application can (but does not
> necessarily, dependent on input) raise exceptions instead of just
> not-quite-right outputs.

May I suggest that while Python's Unicode support is transitional, all 
methods and functions that expect to manipulate Unicode should convert 
strings to Unicode at runtime?  Not all functions would have to do this, 
only those that concatenate strings (I think).

While we're on the subject, I'd like to discuss some thoughts I've had 
on Zope I/O and Unicode.  It seems to me that the HTTP server should 
expect to work with pure byte arrays, since HTTP has only minimal 
provisions for Unicode.  I think the publisher should be the component 
that does the work of converting byte-array requests into Unicode, and 
of converting Unicode into byte arrays for the response.  What do you think?

Shane