[Zope3-dev] HTTP_ACCEPT_CHARSET header

Bjorn Tillenius bjoti777 at student.liu.se
Wed Jun 30 10:22:49 EDT 2004


On Wed, Jun 30, 2004 at 03:08:07PM +0200, Stuart Bishop wrote:
> On 29/06/2004, at 7:26 PM, Philipp von Weitershausen wrote:
> >Many mainstream browsers don't send an HTTP_ACCEPT_CHARSET header.
> >zope.publisher uses this header to deduce the encoding of form values;
> >if this header is missing though, it didn't convert them at all to 
> >unicode.
> >
> >Since Zope's fallback is 'UTF-8' everywhere whenever an encoding is not
> >specified, it should also fallback to trying to decode incoming form
> >data as UTF-8.
>
> I was wondering if just ignoring HTTP_ACCEPT_CHARSET altogether
> would be the sanest approach, or at the very least using a character
> set that can encode the entire Unicode space such as UTF-8 or UTF-16
> if the browser says it is at all possible.

I don't think it's very nice to just ignore it. One way to go though, is
to try to use every encoding the user prefers. If all encodings fail,
use utf-8. But I'm not sure what the specs say about this header. If
it's supposed to be all the charsets the user will accept, then maybe we
shouldn't send it in some other charset than specified, and instead
raise an error.

> An example of when this is necessary is users pasting data into
> HTML forms from other applications. The browser will send the
> data in the character set the page is encoded in, and choose some
> other arbitrary character that can encode it if this cannot be done.
> So when I paste some text from MS-Word into that nice ISO-8859-1
> form Zope3 sent me (because by browser said it would prefer it),
> I get a UnicodeEncodeError because Safari helpfully sent it as
> UTF-8 since ???Smart Quotes??? and ISO-8859-1 don't mix.

Have you actually tried this? I think, if the browser sends something
using utf-8, it should also say it prefers it. But I'm not sure how
different browsers work.

> This approach also assumes that the HTTP_ACCEPT_CHARSET will not
> change between requests, which nobody promises.

No it doesn't. If the browser requests a form, saying it prefers
iso-8859-1, then it sends the form data, encodes it using utf-8, also
saying it prefers utf-8. HTTP_ACCEPT_CHARSET has changed, but it still
works.

The real problem here is that it's impossible to know for sure which
encoding is used. This approach works better than the one before. If you
have any better solution, please share it (ignoring HTTP_ACCEPT_CHARSET
is worse IMHO).

Bjorn


More information about the Zope3-dev mailing list