[Zope3-dev] i18n, unicode, and the underline

Shane Hathaway shane@zope.com
Mon, 14 Apr 2003 10:35:02 -0400


Guido van Rossum wrote:
>>From: Shane Hathaway <shane@zope.com>
>>
>>May I suggest that while Python's Unicode support is transitional, all 
>>methods and functions that expect to manipulate Unicode should convert 
>>strings to Unicode at runtime?  Not all functions would have to do this, 
>>only those that concatenate strings (I think).
> 
> 
> This can be interpreted in two ways.
> 
> One interpretation (which I approve of) is to say, if you have
> something that returns a string, make sure it returns a Unicode string
> even if the inputs are 8-bit strings.  That could be done with a
> simple unicode() cast at the end (or if you need more speed, append
> u"" to the result).  This will work if the inputs already were Unicode
> strings, and also if they were 8-bit strings containing only ASCII
> characters (byte values 0-127 inclusive).  It will fail if an input
> was an 8-bit string containing a non-ASCII byte (byte values >= 128).
> Or you could make all the inputs Unicode in the same way.
> 
> Another interpretation would be to require such functions to be aware
> of the encoding used by 8-bit strings, and convert them to Unicode
> *using the right encoding*.  This seems the wrong idea, so I hope this
> isn't what you meant. :-)

Making everything aware of encodings would certainly be bad, I agree.  I 
intended the first interpretation.

BTW Java, in spite of its deficiencies, got Unicode right by making it 
the default, and I'm happy to see Python learn from the same experience.

Have you considered the possibility of adding a __future__ statement to 
make Python create Unicode objects insteads of strings by default?  (But 
for all I know, that could already be done or rejected.  Feel free to 
say so. :-) )

Shane