[Zope3-dev] i18n, unicode, and the underline

Guido van Rossum guido@python.org
Mon, 14 Apr 2003 12:22:27 -0400


(Trying to wrap this up)

[me]
> > Using floats as e.g. sequence indices also raises an exception.
> 
> From: Martijn Faassen <faassen@vet.uu.nl>
>
> That's true, though that's outside the actual number manipulation code
> usually. And this would always raise an exception, not just sometimes
> dependent on input.

Not if the float only happened when a float was input; then it's about
the same situation.  (Example: a function is usually called with int
args; sometimes it is called with a float arg.)

> I'm not suggesting inherent problems with Unicode or Python's
> implementation of it at all.

I'm glad you aren't.  I got feedback on Unicode from Jim that strongly
suggested *he* thought we'd done it all wrong, and I want to nip this
in the bud if I can.

> Considering the transition requirements I had to reluctantly admit
> last summer that the current design is the right thing (reluctantly
> as it caused me a lot of pain and I felt there should be a better
> way :). I sometimes wish it didn't do any automatic conversion from
> string to unicode at all, but that is rather messy as well in other
> cases, so the current transition situation looks like the best
> compromise.

It's not clear to me how never auto-converting would make your life
easier.

> Would it hurt to introduce a new datatype for bytes in Python 2.x? I
> think that could increase the expressiveness of code that deals with
> bytes and would ease the transition to 3.0 as well. Would it cause
> backwards compatibility issues?

This should be done, but there's no time for Python 2.3.  I'd
appreciate help in writing a PEP.  The new bytes datatype would be
entirely separate from strings; there'd have to be a new "super binary
mode" for files to return bytes instead of strings from read().

> [snip]
> > I wonder if using 8-bit strings encoded as UTF8 would have made things
> > easier than using Unicode strings?
> 
> Possibly. It wouldn't have been according to the DOM standard
> though. But of course in hindsight I would've cared less about
> that. :)

I thought that the DOM only required Unicode support and didn't spell
out how you did it.  Why wouldn't UTF8 be good enough?

> The problem of course is that some people complaining likely will
> have no clue about what's ascii and what's latin-1. :)

There's no hope for them.

--Guido van Rossum (home page: http://www.python.org/~guido/)