[Zope-Checkins] CVS: Zope3/lib/python/Zope/Publisher/HTTP - HTTPCharsets.py:1.4

Guido van Rossum guido@python.org
Fri, 14 Jun 2002 15:33:57 -0400


> +        # UTF-8 is **always** preferred over anything else.
> +        # XXX Please give more details as to why!

I'm guessing that is because all UTF-8 strings are legal Latin-1
strings, (and probably also legal in most other "mode-less" 8-bit
encodings), but in practice, *most* Latin-1 strings aren't valid
UTF-8.  So if you see a string that's legal UTF-8 and also legal
Latin-1, it's more likely that the UTF-8 interpretation is what was
intended, because it's statistically very unlikely that you'd arrive
at a legal UTF-8 string by typing something meaningful in Latin-1.

--Guido van Rossum (home page: http://www.python.org/~guido/)