[Zope3-dev] i18n, unicode, and the underline

Guido van Rossum guido@python.org
Mon, 14 Apr 2003 10:18:06 -0400


> From: Shane Hathaway <shane@zope.com>
>
> May I suggest that while Python's Unicode support is transitional, all 
> methods and functions that expect to manipulate Unicode should convert 
> strings to Unicode at runtime?  Not all functions would have to do this, 
> only those that concatenate strings (I think).

This can be interpreted in two ways.

One interpretation (which I approve of) is to say, if you have
something that returns a string, make sure it returns a Unicode string
even if the inputs are 8-bit strings.  That could be done with a
simple unicode() cast at the end (or if you need more speed, append
u"" to the result).  This will work if the inputs already were Unicode
strings, and also if they were 8-bit strings containing only ASCII
characters (byte values 0-127 inclusive).  It will fail if an input
was an 8-bit string containing a non-ASCII byte (byte values >= 128).
Or you could make all the inputs Unicode in the same way.

Another interpretation would be to require such functions to be aware
of the encoding used by 8-bit strings, and convert them to Unicode
*using the right encoding*.  This seems the wrong idea, so I hope this
isn't what you meant. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)