[Zope3-dev] i18n, unicode, and the underline

Guido van Rossum guido@python.org
Fri, 11 Apr 2003 11:29:13 -0400


> On Fri, 2003-04-11 at 09:26, Stephan Richter wrote:
> 
> > Barry is completely right with this! After a long discussion we decided to 
> > have all human interface strings as unicode. Also The _() is needed for 
> > translations; if Barry figures out how to do it without this, fine, if not 
> > they need to stay too.
> 
> I just realized there's another reason not to like this much: we'll have
> to add a type test to _() since it'll be perfect valid to use a Unicode
> here too.  You can't pass a Unicode string as the first argument to the
> unicode() built-in.
> 
> So you'd have to do something like:
> 
> if not isinstance(s, unicode):
>    s = unicode(s, 'us-ascii')
> 
> Blech.  Eventually all strings in Python will be Unicode just like in
> Jython and then we can remove all the fiddly little u's :).
> 
> -Barry

Before this misinformation spreads, Barry underestimates unicode()!

Without an encoding argument, unicode(u) accepts a unicode string and
returns it unchanged, and unicode(s) also accepts an 8-bit string and
attempts to convert it using the ASCII encoding -- exactly what _()
should do.

Of course, concatenating u"" works exactly the same.  And according to
timeit, it's more than twice as fast!

[guido@odiug linux]$ ./python ../Lib/timeit.py -s 's=""' 'unicode(s)'
1000000 loops, best of 3: 1.43 usec per loop
[guido@odiug linux]$ ./python ../Lib/timeit.py -s 's=""' 's+u""'
1000000 loops, best of 3: 0.668 usec per loop
[guido@odiug linux]$ 

--Guido van Rossum (home page: http://www.python.org/~guido/)