[Zope] Zope 2.6.1 and UTF-8

Toby Dickenson tdickenson at geminidataloggers.com
Wed Sep 10 17:35:40 EDT 2003


On Wednesday 10 September 2003 15:46, Chris Withers wrote:
> Trying again to bring it on list ;-)
>
> Chris Withers wrote:
> > (bringing on-list in case others are interested)
> >
> > Toby Dickenson wrote:
> >>> I've got some stuff that's in strings, so I guess not unicode, but
> >>> which is UTF-8 encoded, and I'm wondering how I make sure Zope does
> >>> "the right thing" here. Are there any docs about?
> >
> > (and just to be clear, I'm using Zope 2.6.1 with ZODB 3.1, what
> > differences will that make?)
> >
> >> Ive submitted a chapter to one of the books that Chris M maintains...
> >> last I looked it still wasnt merged :(
> >> There is some info at
> >> http://zope.org/Members/htrd/howto/unicode
> >> http://zope.org/Members/htrd/howto/unicode-zdg-changes
> >
> > Just had a read of these, very interesting...
> >
> >> 1. convert your strings to either unicode objects or latin-1, so that
> >> dtml or zpt can do the right thing when combining them. (Ive *still*
> >> not used zpt for this, but I assume it works).
> >
> > I will be using ZPT for this, what changes did you make so that ZPT's
> > return unicode strings?

I didnt, but I believe someone was reproducing my dtml semantics in ZPT. I 
forget who was working on this...... 

> >> I recommend converting all language strings to unicode at the earliest
> >> opportunity as a general principal.
> >
> > Hmmm, that's interesting. I'd been planning on keeping everything as
> > UTF-8 encoded strings rather than actual unicode. What leads you to
> > suggest storing everything as unicode?

Its a question of choosing the right data type to represent your data. Doesnt 
it make sense for string methods, character indexing, etc, to work on your 
data as a sequence of unicode characters? 

You wouldnt consider using an 8-bit string to store something that is 
logically an integer, simply because you originally read it from a file or 
socket in 8-bit string form. Why do the same to a unicode string?    (perl 
programmers need not reply ;-)

> >> 2. set a 'Content-Type' header with the value 'text/html;
> >> charset=UTF-8' (or whatever you prefer, but anything other than utf8
> >> has other complications) so that ZPublisher knows how to transmit the
> >> unicode response over http.
> >
> > What are these complications?
> > (luckily I'm going to be using UTF-8 ;-)

The rules for working out what encoding a browser will use when submitting a 
form are complicated, and depend on the encoding of the page that contained 
the form, POST/GET, and browser version. If your pages use UTF-8 then *all* 
form submissions come back in UTF-8. IMO its a no-brainer choice if you have 
forms (or might ever add one).

> >> 3. If there are http forms on those pages, you need to add extra
> >> marshalling tags so that ZPublisher knows what encoding your browser
> >> used when submitting the form.
> >
> > If I do, do I then end up with unicode or strings encoded with the
> > character set I specify?

You get to choose the right data type.....

If you want to receive a unicode string from a form that will be submitted by 
the browser in utf8, then use
<input name="description:utf8:ustring".....

If you want to receive a plain string containing latin-1 characters from a 
form that will be submitted by the browser in utf8, then use
<input name="postcode:utf8:string".....

If you want to receive the bytes as the browser sent them over the wire:
<input name="idontknowwhatthiswouldbefor:string".....

> > Finally, is ZCTextIndex compatible with either unicode or strings that
> > contain UTF-8 encoding?

No idea.


-- 
Toby Dickenson




More information about the Zope mailing list