[Zope3-dev] Re: unicode problems !?

Bjorn Tillenius bjoti777 at student.liu.se
Tue Oct 12 14:38:57 EDT 2004


On Tue, Oct 12, 2004 at 03:27:24PM +0200, Martijn Faassen wrote:
> Hey,
> 
> I'm not sure I understood the entire debate, but I'll summarize what I 
> think should be happening:
> 
> * if a user edits a textarea, then assume the encoding of form submit is 
> that of the presented form, or alternatively generate some explicit 
> encoding setting in the form, as we previously discussed on this list. 
> The default for this encoding in Zope should be UTF-8. Contents that is 
> saved is decoded from UTF-8 and stored as unicode. In my experience 
> browsers, including IE, do submit form data in the same encoding as the 
> way the form was presented; we rely on this heavily in Silva, for 
> instance. Silva uses unicode internally throughout.

When the form value is read from the request, a unicode string is
returned. That part works today, assuming that all browsers do
"the right thing". The question is how the value should be stored.

> * if a user uploads a file in some way, and the file is intended to be 
> textual data, then the encoding of this file is assumed to be UTF-8 by 
> default. However the user can specify an encoding to override this. The 
> textual data is decoded using this encoding, and stored as unicode. If 
> the decoding fails, then the user needs to be presented with an error. 
> We have some experience implementing something like this in Silva, where 
> we provide a Comma Separated Value object (in the SilvaExternalSources 
> extension). Users explicitly specify the encoding of the uploaded CSV 
> data here, and data is stored as unicode.

Always storing the value as unicode is one quite good option, or UTF-8
so that we can use the same Byte field no matter what the content type
is. It's probably the easiest solution since we only have to care about
the encoding when a user uploads a file. Then we don't have to add
another attribute to the File class. One disadvantage is that if I
upload a file using a specific encoding, I might want that encoding to
be used when the file is downloaded. Of course, I guess I could provide
a special view for that. I like this option better than adding an extra
attribute.

Although one thing. If we choose to store the text data as UTF-8 we
should either set the encoding of the response, or decode the data to
unicode before it's being passed to the response. I don't think we do
that today, or does anyone know?

> * if a user uploads a file and this file is *not* intended to be textual 
> data but binary data, then Zope doesn't do a thing, and just stores the 
> bytes. If the developer still uses this data as text at any stage, they 
> should be aware of encoding issues and decode in whatever encoding they 
> see fit. Of course the developer is better off using a stored text file 
> in that case, where unicode is already guaranteed.

Agreed. And if he chooses to change the content type to text/*, the same
thing as when he uploads a text file should happen.

Regards,

Bjorn


More information about the Zope3-dev mailing list