[zope2-tracker] [Bug 160968] Re: Default IUserPreferredCharsets' use of Zope 2's request problematic

Ole Christian Helset ochelset at gmail.com
Thu Feb 25 09:00:33 EST 2010


Using Zope 2.11.5, default-zpublisher-encoding utf-8, rendering content
fails in IE and Safari, as they (at the time of writing) doesn't provide
the Accept-Charset header, if the content contains a string in utf-8.

In http.py (zope/publisher/http.py), the
HTTPCharsets.getPreferredCharsets() method returns an empty list,
causing a UnicodeDecodeError in zope, when a tal:content string contains
utf-8 encoded string with fi. norwegian characters (ø > \xc3\xb8).

I made a simple test, just a default page template, giving it a title with such a character (fi. Pølse):
<html>
  <head>
    <meta http-equiv="content-type" content="text/html;charset=utf-8">
  </head>
  <body>
    <tal:block content="python:repr(template.title)" /><br />
    <tal:block content="python:repr(template.title.encode('latin-1'))" /><br />
    <tal:block content="python:repr(template.title.encode('utf-8'))" /><br />
    <tal:block content="python:title" define="title python:template.title" /><br />
    <tal:block content="python:title" define="title python:template.title.encode('utf-8')" /><br />
  </body>
</html>

In Firefox the output is fine:
u'P\xf8lse'
'P\xf8lse'
'P\xc3\xb8lse'
Pølse
Pølse

In IE and Safari it raises a UnicodeDecodeError


If HTTPCharsets.getPreferredCharsets() returns ['utf-8'], it works fine in IE and Safari as well.

My changes to http.py:
from zope.publisher.base import RequestDataGetter
+from ZPublisher import Converters

...

        # Quoting RFC 2616, $14.2: If no "*" is present in an Accept-Charset
        # field, then all character sets not explicitly mentioned get a
        # quality value of 0, except for ISO-8859-1, which gets a quality
        # value of 1 if not explicitly mentioned.
        # And quoting RFC 2616, $14.2: "If no Accept-Charset header is
        # present, the default is that any character set is acceptable."
        if not sawstar and not sawiso88591 and header_present:
-            charsets.append((1.0, 'iso-8859-1'))
+            charsets.append((1.0, Converters.default_encoding))
        # UTF-8 is **always** preferred over anything else.
        # Reason: UTF-8 is not specific and can encode the entire unicode
        # range , unlike many other encodings. Since Zope can easily use very
        # different ranges, like providing a French-Chinese dictionary, it is
        # always good to use UTF-8.
        charsets.sort(sort_charsets)
        charsets = [charset for quality, charset in charsets]
-        if sawstar and 'utf-8' not in charsets:
+        if not sawstar and 'utf-8' not in charsets: # IS THIS BAD, TO FORCE IN UTF-8???
            charsets.insert(0, 'utf-8')

The question is then, is this a problem, forcing utf-8 here (or the
default-zpublisher-encoding) when the HTTP_ACCEPT_CHARSET is missing in
the request?

-- 
Default IUserPreferredCharsets' use of Zope 2's request problematic
https://bugs.launchpad.net/bugs/160968
You received this bug notification because you are a member of Zope 2
Developers, which is subscribed to Zope 2.


More information about the zope2-tracker mailing list