[Zope3-dev] HTTP_ACCEPT_CHARSET header

Stuart Bishop stuart at stuartbishop.net
Wed Jun 30 11:24:51 EDT 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On 30/06/2004, at 4:22 PM, Bjorn Tillenius wrote:

>> I was wondering if just ignoring HTTP_ACCEPT_CHARSET altogether
>> would be the sanest approach, or at the very least using a character
>> set that can encode the entire Unicode space such as UTF-8 or UTF-16
>> if the browser says it is at all possible.
>
> I don't think it's very nice to just ignore it. One way to go though, 
> is
> to try to use every encoding the user prefers. If all encodings fail,
> use utf-8. But I'm not sure what the specs say about this header. If
> it's supposed to be all the charsets the user will accept, then maybe 
> we
> shouldn't send it in some other charset than specified, and instead
> raise an error.

Looking at the code, it looks like Zope3 is handling this well
by always prefering UTF-8 if the HTTP_ACCEPT_CHARSET specifies it
no matter its priority. I missed that bit, despite the great big
comment :-)

>> An example of when this is necessary is users pasting data into
>> HTML forms from other applications. The browser will send the
>> data in the character set the page is encoded in, and choose some
>> other arbitrary character that can encode it if this cannot be done.
>> So when I paste some text from MS-Word into that nice ISO-8859-1
>> form Zope3 sent me (because by browser said it would prefer it),
>> I get a UnicodeEncodeError because Safari helpfully sent it as
>> UTF-8 since ???Smart Quotes??? and ISO-8859-1 don't mix.
>
> Have you actually tried this? I think, if the browser sends something
> using utf-8, it should also say it prefers it. But I'm not sure how
> different browsers work.

Safari used to do this. Testing again, it now replaces
unencodable characters with '?'. Mozilla & Firefox both
replace unencodable characters with XML entity references.

Safari doesn't seem to send an accept-charset header (although it
does send accept:, accept-encoding: and accept-language), so the
current rules have it defaulting to UTF8.

Mozilla 1.7 sends "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"

IE5 for Mac doesn't let me paste arbitrary unicode into a form
at all by forcing it into the page's encoding.

So it is only a problem if a browser sents accept-charset:
without specifying utf8 or the wildcard. I don't have any browsers
that do that on this Mac.

I suspect that the processing being done might be pointless,
given that I suspect all browsers will either not send the
accept-charset header at all, or specify UTF8 as an allowed
encoding. But I doubt we can prove that today, let alone tomorrow ;)

> No it doesn't. If the browser requests a form, saying it prefers
> iso-8859-1, then it sends the form data, encodes it using utf-8, also
> saying it prefers utf-8. HTTP_ACCEPT_CHARSET has changed, but it still
>
> The real problem here is that it's impossible to know for sure which
> encoding is used. This approach works better than the one before. If 
> you
> have any better solution, please share it (ignoring HTTP_ACCEPT_CHARSET
> is worse IMHO).

Yes - the 'always prefer UTF8' feature makes it work happily.
- --  
Stuart Bishop <stuart at stuartbishop.net>
http://www.stuartbishop.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (Darwin)

iD8DBQFA4ttDAfqZj7rGN0oRAm9hAKCb4E/fwB/3EAqV9Sfhxc0ABUzXuwCfcsCP
mccgXrYAeD2Zxxmg6+orsFU=
=3qph
-----END PGP SIGNATURE-----



More information about the Zope3-dev mailing list