[Zope3-dev] i18n, unicode, and the underline

Martijn Faassen faassen@vet.uu.nl
Tue, 15 Apr 2003 20:51:23 +0200


Shane Hathaway wrote:
> Martijn Faassen wrote:
> >Shane Hathaway wrote:
> >[snip]
> >>So, everyone, aren't there any other examples of binary strings mixing 
> >>unexpectedly with Unicode?  If not, surely the "u" prefix is unnecessary.
> >
> >Why would the u prefix help with this anyway? You'd still have binary 
> >strings
> >mixing unexpectedly with unicode, u prefix for literals or not.
> >
> >Of course these problems exist. It's our job as Zope 3 framework developers
> >to take these problems out of the user's hands, give them unicode
> >to work with, and let the framework take care of encoding issues as much
> >as possible. That way experts only have to worry about it in a few isolated
> >places, instead of developers worrying about it everywhere.
> 
> Because Unicode support is in transition, it's not quite that simple, 

I don't see why it is not quite that simple. The only thing that I can
conceive of as worse than Java is that the automatic 'upcasting' to unicode
takes place implicitly we get errors in odd places. This is only a problem
when 8-bit strings of another encoding than ascii sneak into a unicode
using framework.

> but we're not in a bad situation either.  Here is what I've concluded 
> from this discussion:
> 
> - Python Unicode support is headed the same direction as Java Unicode 
> support.
> 
> - There is no need to prefix all strings with "u".

There is no need to prefix ascii literals with 'u'.

> - We're still going to have spurious bugs, but we expect them to be 
> special cases.  For example, expressions like this will fail intermittently:
> 
> u"The OID of obj is %s" % obj._p_oid
> 
> ... but of course this expression is wrong; it should use repr().  It's 
> also for developers only.

If _p_oid is actually a sequence of bytes then yes, that would be wrong.

I guess this is what you mean by it not being that simple -- in Python 
at present it is easier to make mistakes like this than it is in
Java, because of the implicit upcasting and it sometimes working.

I agree that using _p_oid in your expression almost certainly qualifies you
as a framework developer. :)

> P.S. I'd reply to each of your posts but I think you understood better 
> each time you posted, so I don't think I have to write very much. :-)

My understanding of unicode hasn't changed, you just seem to express yourself
confusingly and I am not sure whether you understand or not. I presume
you do, but I think clarity of expression is important for this topic,
otherwise people will be confused. I don't want to get: "But Shane said there 
is no need to prefix all strings with 'u'!" when the person saying this
is using non-ascii strings.

Regards,

Martijn