[Zope] charset from forms input

Matt matt.bion@eudoramail.com
Thu, 14 Dec 2000 09:45:53 +1300


Hi, I seem to have come across the depressing fact that most browsers
will not return a charset parameter in the http header when a form is
submitted.  For example, the following from Netscape ... (it happens
with both IE and Netscape on many platforms I have tried ... Mac, all
Windows, and Linux).

POST /hi HTTP/1.0
Referer: http://localhost:8080/temp/test_form
Connection: Keep-Alive
User-Agent: Mozilla/4.72 [en] (X11; U; Linux 2.2.14-5.0 i686)
Host: 172.16.21.165:50009
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png,
*/*
Accept-Encoding: gzip
Accept-Language: en
Accept-Charset: iso-8859-1,*,utf-8

Content-type: multipart/form-data;
boundary=---------------------------17670043309955870831526446972
Content-Length: 180


So much for a useful Content-type.    I know this is NOT a zope issue,
but I was hoping someone had an easy answer.  There is such a myriad of
character encodings out there that is makes it quite difficult to
handle.  The example that most frustrates us are the two byte encodings
vs the one.  I.e. : two common defaults people set their browsers on in
windows are either Western (ISO) or Western (Windows) ... the former
being a two byte encoding set and the latter being a one byte(presumably
ISO-8859-1 + the unhelpful use of the control set  0x85 - 0x95(hex)).
People often copy and paste from word into form text inputs, and as a
quick hack we made up a byte conversion table for the "Microsoft"
range.  So Western(Windows) works, but of course Western(ISO) does not.
How does one detect these?  and more the point, how does one test easily
for any of the other encoding standards?

Surely this has bugged a lot of people?

regards
Matt