[Zope3-dev] i18n, unicode, and the underline

Guido van Rossum guido@python.org
Tue, 15 Apr 2003 08:57:18 -0400


> > > I'm not suggesting inherent problems with Unicode or Python's
> > > implementation of it at all.
> > 
> Guido van Rossum wrote:
> > I'm glad you aren't.  I got feedback on Unicode from Jim that
> > strongly suggested *he* thought we'd done it all wrong, and I want
> > to nip this in the bud if I can.

[Martijn]
> He was frustrated by the automatic promotion of 8-bit strings
[containing ASCII only]
> to unicode, which I understand.

This is a circular reasoning.  Jim's frustration doesn't come from the
automatic promotion but from tracking down certain Unicode bugs in
Zope 2, and he (misguidedly IMO) believed that without promotion these
bugs would have been easier to track down.  (Not that they wouldn't
have occurred at all!)

I'm quite sure that without such automatic promotion we'd all be very
frustrated about the difficulty of converting text-processing code to
Unicode.

> I do wish there was some better way to turn this behavior off on a
> per-module basis than going to site.py. But considering the
> tradeoffs the current behavior seems to be the best compromise.

Why do *you* want to turn this behavior off?

> [snip]
> > It's not clear to me how never auto-converting would make your life
> > easier.
> 
> You'd get an unicode error in the place where you made a mistake,
> instead of in some later section of the code.  If you care about
> 8-bit strings are bytes (which Jim does when he writes networking
> code) then you want this behavior, for instance.

It should not be hard to typecheck the arguments to networking
routines to make sure they are 8-bit strings; then the rest of the
networking code won't have to worry about Unicode sneaking in.

I also wonder if it would have eased the pain of tracking down those
problems if the contents of the offending strings would have been
shown in the error message.  That would probably have revealed their
source.

> That said, autoconverting can also make ones life easier, especially
> in the transition environment we're in.

For example, most text processing code also uses string literals,
either to search for or to insert (e.g. "<" or "\n").  It's a pure
blessing that as long as these are ASCII, text processing code works
for 8-bit text as well as for Unicode.

[snip]

> I wasn't fully aware of the details of this last year, now I am, so
> it's not rocket science. :)

You may be underestimating youreself. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)