[Zope3-dev] i18n, unicode, and the underline

Shane Hathaway shane@zope.com
Mon, 14 Apr 2003 12:56:55 -0400


Guido van Rossum wrote:
>>That still leaves the literal string "hole", and IMHO it's a really big 
>>hole.  Three methods have been suggested for patching this hole: 
>>prefixing all literal strings with "u", calling the unicode() builtin in 
>>code that concatenates strings, or using 'python -U'.  Since 'python -U' 
>>doesn't quite work, we only have the first two options for now, and both 
>>are a burden for the programmer.  One requires uglifying the source, and 
>>the other requires deeper knowledge than we wanted to require of 
>>programmers.  Ouch.
> 
> I've forgotten the context...  Why you would want string manipulation
> functions to return Unicode even when the result can be expressed as
> ASCII?

Hmm, Python does try very hard to hide the difference between ASCII 
strings and Unicode, so you have a good point.  What's missing is the 
ability to clearly distinguish between ASCII strings and binary strings. 
  When a function that expects only ASCII or Unicode gets a binary 
string, it might blow up, but not every time, and the source of the 
error is often hard to find.  This has caused pain for Zope 3 developers.

What if strings had a "binary" flag?  Any attempt to combine a binary 
string with a Unicode should fail, even if the binary string has all of 
the high bits unset.  Literal strings should be ASCII strings unless 
they have any characters with the high bit set or they have '\0' characters.

Errors in combining strings with Unicode would probably be caught 
earlier this way.  This would be a little different from the new byte 
array type, since binary strings would be immutable and share 
implementation with ASCII strings.

Shane