[Zope3-dev] i18n, unicode, and the underline
Shane Hathaway
shane@zope.com
Mon, 14 Apr 2003 12:56:55 -0400
Guido van Rossum wrote:
>>That still leaves the literal string "hole", and IMHO it's a really big
>>hole. Three methods have been suggested for patching this hole:
>>prefixing all literal strings with "u", calling the unicode() builtin in
>>code that concatenates strings, or using 'python -U'. Since 'python -U'
>>doesn't quite work, we only have the first two options for now, and both
>>are a burden for the programmer. One requires uglifying the source, and
>>the other requires deeper knowledge than we wanted to require of
>>programmers. Ouch.
>
> I've forgotten the context... Why you would want string manipulation
> functions to return Unicode even when the result can be expressed as
> ASCII?
Hmm, Python does try very hard to hide the difference between ASCII
strings and Unicode, so you have a good point. What's missing is the
ability to clearly distinguish between ASCII strings and binary strings.
When a function that expects only ASCII or Unicode gets a binary
string, it might blow up, but not every time, and the source of the
error is often hard to find. This has caused pain for Zope 3 developers.
What if strings had a "binary" flag? Any attempt to combine a binary
string with a Unicode should fail, even if the binary string has all of
the high bits unset. Literal strings should be ASCII strings unless
they have any characters with the high bit set or they have '\0' characters.
Errors in combining strings with Unicode would probably be caught
earlier this way. This would be a little different from the new byte
array type, since binary strings would be immutable and share
implementation with ASCII strings.
Shane