[Zope3-dev] i18n, unicode, and the underline

Guido van Rossum guido@python.org
Mon, 14 Apr 2003 08:13:53 -0400


> Guido van Rossum wrote:
> > The only time you get in trouble is when a non-Unicode string contains
> > non-ASCII characters.

[Martijn]
> And as I noted elsewhere, especially in systems not built for unicode,
> this can be a significant amount of trouble..
> 
> As long as the application gives an error very early when you do
> that, it's probably okay, though.

Right.  If all the I/O libraries deal with Unicode correctly, the only
remaining problem is 8-bit string literals containing non-ASCII data.
This is indeed an abomination and should be discouraged.  I wish
Python could issue a compile-time warning about these --
unfortunately, for non-internationalized applications in non-English
languages this is still a very common practice. :-(

> I think custom-built forms right now (not those using schema/forms)
> would deliver 8 bit strings to the application logic. We need to
> make sure that people are discouraged from doing so, as it can lead
> to confusing bugs as soon as these strings leak into application
> code -- they need to be converted to unicode as soon as they enter
> the application. Likewise for relational database adapters; I wonder
> what they do currently.. If they pass latin-1 strings directly into
> the rest of Zope 3 then that's wrong, for instance.

Right, and the prohibition on 8-bit literals doesn't help prevent this
at all.

--Guido van Rossum (home page: http://www.python.org/~guido/)