[Zope3-dev] Re: unicode problems !?

Martijn Faassen faassen at infrae.com
Tue Oct 12 09:27:24 EDT 2004


Hey,

I'm not sure I understood the entire debate, but I'll summarize what I 
think should be happening:

* if a user edits a textarea, then assume the encoding of form submit is 
that of the presented form, or alternatively generate some explicit 
encoding setting in the form, as we previously discussed on this list. 
The default for this encoding in Zope should be UTF-8. Contents that is 
saved is decoded from UTF-8 and stored as unicode. In my experience 
browsers, including IE, do submit form data in the same encoding as the 
way the form was presented; we rely on this heavily in Silva, for 
instance. Silva uses unicode internally throughout.

* if a user uploads a file in some way, and the file is intended to be 
textual data, then the encoding of this file is assumed to be UTF-8 by 
default. However the user can specify an encoding to override this. The 
textual data is decoded using this encoding, and stored as unicode. If 
the decoding fails, then the user needs to be presented with an error. 
We have some experience implementing something like this in Silva, where 
we provide a Comma Separated Value object (in the SilvaExternalSources 
extension). Users explicitly specify the encoding of the uploaded CSV 
data here, and data is stored as unicode.

* if a user uploads a file and this file is *not* intended to be textual 
data but binary data, then Zope doesn't do a thing, and just stores the 
bytes. If the developer still uses this data as text at any stage, they 
should be aware of encoding issues and decode in whatever encoding they 
see fit. Of course the developer is better off using a stored text file 
in that case, where unicode is already guaranteed.

Regards,

Martijn


More information about the Zope3-dev mailing list