[Zope3-dev] i18n, unicode, and the underline
Martijn Faassen
faassen@vet.uu.nl
Tue, 15 Apr 2003 20:51:23 +0200
Shane Hathaway wrote:
> Martijn Faassen wrote:
> >Shane Hathaway wrote:
> >[snip]
> >>So, everyone, aren't there any other examples of binary strings mixing
> >>unexpectedly with Unicode? If not, surely the "u" prefix is unnecessary.
> >
> >Why would the u prefix help with this anyway? You'd still have binary
> >strings
> >mixing unexpectedly with unicode, u prefix for literals or not.
> >
> >Of course these problems exist. It's our job as Zope 3 framework developers
> >to take these problems out of the user's hands, give them unicode
> >to work with, and let the framework take care of encoding issues as much
> >as possible. That way experts only have to worry about it in a few isolated
> >places, instead of developers worrying about it everywhere.
>
> Because Unicode support is in transition, it's not quite that simple,
I don't see why it is not quite that simple. The only thing that I can
conceive of as worse than Java is that the automatic 'upcasting' to unicode
takes place implicitly we get errors in odd places. This is only a problem
when 8-bit strings of another encoding than ascii sneak into a unicode
using framework.
> but we're not in a bad situation either. Here is what I've concluded
> from this discussion:
>
> - Python Unicode support is headed the same direction as Java Unicode
> support.
>
> - There is no need to prefix all strings with "u".
There is no need to prefix ascii literals with 'u'.
> - We're still going to have spurious bugs, but we expect them to be
> special cases. For example, expressions like this will fail intermittently:
>
> u"The OID of obj is %s" % obj._p_oid
>
> ... but of course this expression is wrong; it should use repr(). It's
> also for developers only.
If _p_oid is actually a sequence of bytes then yes, that would be wrong.
I guess this is what you mean by it not being that simple -- in Python
at present it is easier to make mistakes like this than it is in
Java, because of the implicit upcasting and it sometimes working.
I agree that using _p_oid in your expression almost certainly qualifies you
as a framework developer. :)
> P.S. I'd reply to each of your posts but I think you understood better
> each time you posted, so I don't think I have to write very much. :-)
My understanding of unicode hasn't changed, you just seem to express yourself
confusingly and I am not sure whether you understand or not. I presume
you do, but I think clarity of expression is important for this topic,
otherwise people will be confused. I don't want to get: "But Shane said there
is no need to prefix all strings with 'u'!" when the person saying this
is using non-ascii strings.
Regards,
Martijn