[Zope-CMF] Charsets

Charlie Clark charlie at begeistert.org
Sun Jan 18 16:30:10 EST 2009


Am 18.01.2009 um 20:36 schrieb Dieter Maurer:

> The "Accept-Charset" request header should *never* be used
> to guess a charset at the server side:
>
>  "Accept-Charset" is a user preference which does not know
>  anything about charsets used by the server.
>
> If "utf-8" would not be treated with preference in the
> current code, the code base would see massive problems.
>
> Only the server knows which charsets it is using -- and it should
> use a single one (with very few exceptions).
> There should be a configuration option that tells this charset
> and this should be used to decode form data.


Dieter,

I very much appreciate that your knowledge both of the specifications  
but more particularly of Zope internals is greater than mine. I am,  
however, not suggesting that accept-charset be used more than it  
already is by Zope for precisely the reasons you suggest.

 From the current HTML specification:

"accept-charset = charset list [CI]
This attribute specifies the list of character encodings for input  
data that is accepted by the server processing this form. The value is  
a space- and/or comma-delimited list of charset values. The client  
must interpret this list as an exclusive-or list, i.e., the server is  
able to accept any single character encoding per entity received."

ie. exactly as you have suggested: it is possible to force a client to  
encode data in a particular charset before sending it to the server.  
All references I have come across suggest that this, together with the  
meta tag content-type can and should be used to coerce browsers to use  
UTF-8. On the other hand, whenever CMFDefault.utils.decode is called  
the extremely unreliable getBrowserCharset() is used which will  
usually return iso-8859-1. It is probably down to the way I have set  
my site up but I currently have problems as a result of this when  
using different browsers unless I override the default adapter.

Regarding my current configuration:
default-zpublisher-encoding = utf-8
default-charset = utf-8

All content objects are edited through formlib-derived forms and data  
is stored as unicode. With a default CMF install I have not been able  
to work with non-ASCII strings across OS and browser boundaries. If  
possible I will try and create test cases that demonstrate the problems.

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





More information about the Zope-CMF mailing list