[Zope3-dev] Unicode handling in Zope3 Page Templates

Sidnei da Silva sidnei at awkly.org
Tue Aug 31 09:58:58 EDT 2004


On Tue, Aug 31, 2004 at 11:08:31AM +0200, Martijn Faassen wrote:
| >>If I use sys.setdefaultencoding('utf-8') in sitecustomize.py,
| > 
| >I consider this evil.  I know Guido would be happy if this wasn't in
| >the language.  This is just not an option.
| 
| I think Sidnei shouldn't even have brought this up; it should have 
| absolutely nothing to do with the rest of the discussion and what Sidnei 
| actually did in the end.

Yes, sorry for that. I shouldn't have even mentioned it.

| Sidnei, please please get rid of this 
| sys.setdefaultencoding() thing and forget it ever existed -- it's 
| absolutely impossible to support code that even faintly looks like it 
| may depend on it.

I did get rid of it.

| This code isn't depending on setdefaultencoding() though, as far as I 
| can see.

That's correct. Nothing depends on setdefaultencoding().

| >Probably not. Certainly not in it's current form.
| 
| What is wrong with the current form, Jim? What it tries to do is make 
| the page template engine 'unicode agnostic', instead of 'unicode only'. 
| Right now, the page template engine in Zope 3 (unlike the one in Zope 
| 2), only works if you feed in unicode strings only. This is fine in Zope 
| 3, but if you want to use it outside of Zope 3, this may not be what you 
| want.
| 
| So, Sidnei attempted to change it so that it'll work if you put in 
| normal (encoded) strings only, or if you put in unicode strings only. 
| Combinations will still fail miserably (this hasn't changed). The 
| failure will even happen in the same place -- getvalue() in StringIO at 
| the end. The only type of thing that can be combined safely with both is 
| plain ascii strings; i.e. it relies on the unchanged default encoding of 
| the system.
| 
| I can see something wrong with the following hack Sidnei employed in a 
| few places, where he replaced
| 
| unicode(text)
| 
| with
| 
| isinstance(text, basestring) and text or unicode(text)

Yes, that's certainly wrong now that I look at it. I think that's what
Jim was referring to?

| ..
| 
| plain unicode(text), like str(text), doesn't typically work in unicode 
| agnostic code.
| 
| Trying to reconstruct the logic in more readable form (which is 
| difficult, indicating that this code shouldn't be employed :), Sidnei's 
| code looks like this, I think..:

Did you mean 'deployed' above? :)

| if isinstance(text, basestring):
|     result = text
| else:
|     result = unicode(text)
| 
| this is the wrong thing if text is in fact not a basestring, but, say, a 
| number, which I suspect is something that can happen, even though the 
| thing is misleadingly called 'text' -- it's why the 'unicode()' is there 
| in the first place. In this case, the string representation of the 
| number will be in unicode, which will be wrong if you're running in 
| pure-encoded mode. Sidnei, you need to include a unit test where the 
| data that enters the page template is not a string; I think it will 
| fail. Also include a few tests where the data is actually 0 while you're 
| at it, if you are a fan of shortcuts. :)

Ok, added a test with numbers.

| Anyway, what would work better is the following:
| 
| if ininstance(text, basestring):
|     result = text
| else:
|     result = str(text)

Changed to use this form.

| As long as str(text) == str(unicode(text)) is True (and doesn't fail 
| with a unicode error), this will at least work correctly in both unicode 
| mode (as it can deal with plain-ascii) as well as encoded mode (as it 
| can deal with plain ascii).
| 
| For built-ins outside unicode strings, str(text) == str(unicode(text)) I 
| think always applies. The problem remains with other objects which are 
| not built-ins which may want to return unicode strings; i.e. custom 
| objects which define __unicode__(). Perhaps i18n-ed strings? -- that's 
| another good candidate for a test.

Not sure what's going to happen in this case. I think str() doesn't
even look at __unicode__()? Then we probably need your solution below.
I've added a test using i18n:translate, and then found that the base
'Context' object doesn't have a translate() method, even though the
page template will accept and execute i18n:translate commands. I've
added one that just returns the msgid. It should probably return a
unicode string there? In which case we are hosed if i18n:translate
tags are used in the template, as that's guaranteed to return unicode
strings AFAICS.

| If we *do* need unicode(text) to work safely, we'll need to refactor the 
| ZPT code so it can actually run in 'encoded mode' as well in 'unicode 
| mode'. Then any cases where we see 'unicode(text)' (not many, mind), 
| need to be replaced with something like:
| 
| if encoded_mode:
|     result = str(text)
| else:
|     result = unicode(text)
| 
| > The problem is that you can't really predict what the
| > encoding will be in Zope 2.  IMO, it is better not to guess.
| 
| That's not what Sidnei's code is trying to do. I suggested to him to try 
| to make it unicode-agnostic. :)

Exactly.

| > If you did guess, you'd probably want to guess latin 1.
| 
| That would fail miserably in very common Zope 2 systems, like Silva or 
| Plone. :)

Yup. I've tried it *wink*

| > I don't have any good ideas for a short-term hack.  Maybe someone
| > else does.
| 
| My best hack so far is what I proposed above. It's not that different 
| from Sidnei's, though less buggy. :)

Ok, applied your suggestions. Here's the new patch.

-- 
Sidnei da Silva <sidnei at awkly.org>
http://awkly.org - dreamcatching :: making your dreams come true
http://www.enfoldsystems.com
http://plone.org/about/team#dreamcatcher

Remember, God could only create the world in 6 days because he didn't
have an established user base.
-------------- next part --------------
Index: src/zope/tal/talinterpreter.py
===================================================================
--- src/zope/tal/talinterpreter.py	(revision 27362)
+++ src/zope/tal/talinterpreter.py	(working copy)
@@ -562,7 +562,11 @@
         if structure is self.Default:
             self.interpret(block)
             return
-        text = unicode(structure)
+        if isinstance(structure, basestring):
+            text = structure
+        else:
+            text = str(structure)
+
         if not (repldict or self.strictinsert):
             # Take a shortcut, no error checking
             self.stream_write(text)
Index: src/zope/app/pagetemplate/engine.py
===================================================================
--- src/zope/app/pagetemplate/engine.py	(revision 27362)
+++ src/zope/app/pagetemplate/engine.py	(working copy)
@@ -21,6 +21,8 @@
 
 from zope.interface import implements
 
+from zope.hookable import hookable
+
 from zope.tales.expressions import PathExpr, StringExpr, NotExpr, DeferExpr
 from zope.tales.expressions import SimpleModuleImporter
 from zope.tales.pythonexpr import PythonExpr
@@ -102,7 +104,7 @@
         if isinstance(text, basestring):
             # text could be a proxied/wrapped object
             return text
-        return unicode(text)
+        return str(text)
 
     def evaluateMacro(self, expr):
         macro = Context.evaluateMacro(self, expr)
@@ -403,7 +405,6 @@
     def pt_getEngine(self):
         return Engine
 
-
 class TrustedAppPT(object):
 
     def pt_getEngine(self):
Index: src/zope/pagetemplate/pagetemplate.py
===================================================================
--- src/zope/pagetemplate/pagetemplate.py	(revision 27362)
+++ src/zope/pagetemplate/pagetemplate.py	(working copy)
@@ -112,7 +112,7 @@
         if self._v_errors:
             raise PTRuntimeError(str(self._v_errors))
 
-        output = StringIO(u'')
+        output = StringIO()
         context = self.pt_getEngineContext(namespace)
         TALInterpreter(self._v_program, self._v_macros,
                        context, output, tal=not source, strictinsert=0)()
Index: src/zope/pagetemplate/tests/input/nonascii.txt
===================================================================
--- src/zope/pagetemplate/tests/input/nonascii.txt	(revision 0)
+++ src/zope/pagetemplate/tests/input/nonascii.txt	(revision 0)
@@ -0,0 +1,2 @@
+In every census between 1960 and 2000, rural counties have constituted
+95 percent of those labeled ?persistently poor.?

Property changes on: src/zope/pagetemplate/tests/input/nonascii.txt
___________________________________________________________________
Name: svn:eol-style
   + native

Index: src/zope/pagetemplate/tests/input/teeshop3.html
===================================================================
--- src/zope/pagetemplate/tests/input/teeshop3.html	(revision 0)
+++ src/zope/pagetemplate/tests/input/teeshop3.html	(revision 0)
@@ -0,0 +1,6 @@
+<html metal:use-macro="options/laf/macros/page">
+<div metal:fill-slot="body">
+<tal:block replace="options/data" />
+<tal:block replace="structure options/data" />
+</div>
+</html>

Property changes on: src/zope/pagetemplate/tests/input/teeshop3.html
___________________________________________________________________
Name: svn:eol-style
   + native

Index: src/zope/pagetemplate/tests/input/teeshop4.html
===================================================================
--- src/zope/pagetemplate/tests/input/teeshop4.html	(revision 0)
+++ src/zope/pagetemplate/tests/input/teeshop4.html	(revision 0)
@@ -0,0 +1,9 @@
+<html metal:use-macro="options/laf/macros/page">
+<div metal:fill-slot="body">
+<tal:block replace="options/data" />
+<tal:block replace="structure options/data" />
+<span tal:attributes="id options/data" />
+<span i18n:translate="" tal:content="options/data">SPAM</span>
+<span i18n:attributes="name" tal:attributes="name options/data" />
+</div>
+</html>

Property changes on: src/zope/pagetemplate/tests/input/teeshop4.html
___________________________________________________________________
Name: svn:eol-style
   + native

Index: src/zope/pagetemplate/tests/input/teeshop5.html
===================================================================
--- src/zope/pagetemplate/tests/input/teeshop5.html	(revision 0)
+++ src/zope/pagetemplate/tests/input/teeshop5.html	(revision 0)
@@ -0,0 +1,9 @@
+<html metal:use-macro="options/laf/macros/page">
+<div metal:fill-slot="body">
+<tal:block replace="options/data" />
+<tal:block replace="structure options/data" />
+<span tal:attributes="id options/data" />
+<span i18n:translate="" tal:content="options/data">SPAM</span>
+<span i18n:attributes="name" tal:attributes="name options/data" />
+</div>
+</html>

Property changes on: src/zope/pagetemplate/tests/input/teeshop5.html
___________________________________________________________________
Name: svn:eol-style
   + native

Index: src/zope/pagetemplate/tests/output/teeshop3.html
===================================================================
--- src/zope/pagetemplate/tests/output/teeshop3.html	(revision 0)
+++ src/zope/pagetemplate/tests/output/teeshop3.html	(revision 0)
@@ -0,0 +1,55 @@
+<html>
+<head>
+<title>Zope Stuff</title>
+<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
+<link rel="stylesheet" href="/common.css">
+</head>
+
+<body bgcolor="#FFFFFF" leftmargin="0" topmargin="0" marginwidth="0" marginheight="0">
+<table width="100%" border="0" cellspacing="0" cellpadding="0">
+  <tr bgcolor="#0000CC" align="center"> 
+    <td> 
+      <table width="200" border="0" cellspacing="0" cellpadding="0">
+        <tr bgcolor="#FFFFFF"> 
+          <td><img src="/images/lside.gif" width="52" height="94"><img src="/images/swlogo.gif" width="150" height="89"><img src="/images/rside.gif" width="52" height="94"></td>
+        </tr>
+      </table>
+    </td>
+  </tr>
+</table>
+<br>
+<table width="300" border="0" cellspacing="0" cellpadding="0" align="center">
+  <tr align="center"> 
+    <td width="25%" class="boldbodylist">apparel</td>
+    <td width="25%" class="boldbodylist">mugs</td>
+    <td width="25%" class="boldbodylist">toys</td>
+    <td width="25%" class="boldbodylist">misc</td>
+  </tr>
+</table>
+<br>
+<br>
+<div>
+In every census between 1960 and 2000, rural counties have constituted
+95 percent of those labeled ?persistently poor.?
+
+In every census between 1960 and 2000, rural counties have constituted
+95 percent of those labeled ?persistently poor.?
+
+</div>
+<br><br>
+<table width="100%" border="0" cellspacing="1" cellpadding="3" align="center">
+  <tr> 
+    <td align="center" bgcolor="#FFFFFF" class="bodylist">
+      Copyright &copy; 2000 
+      <a href="http://www.4-am.com">4AM Productions, Inc.</a>.
+      All rights reserved. <br>
+      Questions or problems should be directed to
+      <a href="mailto:webmaster at teamzonline.com">the webmaster</a>,
+      254-412-0846.</td>
+  </tr>
+  <tr> 
+    <td align="center"><img src="/images/zopelogos/buildzope.gif" width="54" height="54"></td>
+  </tr>
+</table>
+</body>
+</html>

Property changes on: src/zope/pagetemplate/tests/output/teeshop3.html
___________________________________________________________________
Name: svn:eol-style
   + native

Index: src/zope/pagetemplate/tests/output/teeshop4.html
===================================================================
--- src/zope/pagetemplate/tests/output/teeshop4.html	(revision 0)
+++ src/zope/pagetemplate/tests/output/teeshop4.html	(revision 0)
@@ -0,0 +1,54 @@
+<html>
+<head>
+<title>Zope Stuff</title>
+<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
+<link rel="stylesheet" href="/common.css">
+</head>
+
+<body bgcolor="#FFFFFF" leftmargin="0" topmargin="0" marginwidth="0" marginheight="0">
+<table width="100%" border="0" cellspacing="0" cellpadding="0">
+  <tr bgcolor="#0000CC" align="center"> 
+    <td> 
+      <table width="200" border="0" cellspacing="0" cellpadding="0">
+        <tr bgcolor="#FFFFFF"> 
+          <td><img src="/images/lside.gif" width="52" height="94"><img src="/images/swlogo.gif" width="150" height="89"><img src="/images/rside.gif" width="52" height="94"></td>
+        </tr>
+      </table>
+    </td>
+  </tr>
+</table>
+<br>
+<table width="300" border="0" cellspacing="0" cellpadding="0" align="center">
+  <tr align="center"> 
+    <td width="25%" class="boldbodylist">apparel</td>
+    <td width="25%" class="boldbodylist">mugs</td>
+    <td width="25%" class="boldbodylist">toys</td>
+    <td width="25%" class="boldbodylist">misc</td>
+  </tr>
+</table>
+<br>
+<br>
+<div>
+42
+42
+<span id="42" />
+<span>42</span>
+<span name="42" />
+</div>
+<br><br>
+<table width="100%" border="0" cellspacing="1" cellpadding="3" align="center">
+  <tr> 
+    <td align="center" bgcolor="#FFFFFF" class="bodylist">
+      Copyright &copy; 2000 
+      <a href="http://www.4-am.com">4AM Productions, Inc.</a>.
+      All rights reserved. <br>
+      Questions or problems should be directed to
+      <a href="mailto:webmaster at teamzonline.com">the webmaster</a>,
+      254-412-0846.</td>
+  </tr>
+  <tr> 
+    <td align="center"><img src="/images/zopelogos/buildzope.gif" width="54" height="54"></td>
+  </tr>
+</table>
+</body>
+</html>

Property changes on: src/zope/pagetemplate/tests/output/teeshop4.html
___________________________________________________________________
Name: svn:eol-style
   + native

Index: src/zope/pagetemplate/tests/output/teeshop5.html
===================================================================
--- src/zope/pagetemplate/tests/output/teeshop5.html	(revision 0)
+++ src/zope/pagetemplate/tests/output/teeshop5.html	(revision 0)
@@ -0,0 +1,54 @@
+<html>
+<head>
+<title>Zope Stuff</title>
+<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
+<link rel="stylesheet" href="/common.css">
+</head>
+
+<body bgcolor="#FFFFFF" leftmargin="0" topmargin="0" marginwidth="0" marginheight="0">
+<table width="100%" border="0" cellspacing="0" cellpadding="0">
+  <tr bgcolor="#0000CC" align="center"> 
+    <td> 
+      <table width="200" border="0" cellspacing="0" cellpadding="0">
+        <tr bgcolor="#FFFFFF"> 
+          <td><img src="/images/lside.gif" width="52" height="94"><img src="/images/swlogo.gif" width="150" height="89"><img src="/images/rside.gif" width="52" height="94"></td>
+        </tr>
+      </table>
+    </td>
+  </tr>
+</table>
+<br>
+<table width="300" border="0" cellspacing="0" cellpadding="0" align="center">
+  <tr align="center"> 
+    <td width="25%" class="boldbodylist">apparel</td>
+    <td width="25%" class="boldbodylist">mugs</td>
+    <td width="25%" class="boldbodylist">toys</td>
+    <td width="25%" class="boldbodylist">misc</td>
+  </tr>
+</table>
+<br>
+<br>
+<div>
+bruce
+bruce
+<span id="bruce" />
+<span>bruce</span>
+<span name="bruce" />
+</div>
+<br><br>
+<table width="100%" border="0" cellspacing="1" cellpadding="3" align="center">
+  <tr> 
+    <td align="center" bgcolor="#FFFFFF" class="bodylist">
+      Copyright &copy; 2000 
+      <a href="http://www.4-am.com">4AM Productions, Inc.</a>.
+      All rights reserved. <br>
+      Questions or problems should be directed to
+      <a href="mailto:webmaster at teamzonline.com">the webmaster</a>,
+      254-412-0846.</td>
+  </tr>
+  <tr> 
+    <td align="center"><img src="/images/zopelogos/buildzope.gif" width="54" height="54"></td>
+  </tr>
+</table>
+</body>
+</html>

Property changes on: src/zope/pagetemplate/tests/output/teeshop5.html
___________________________________________________________________
Name: svn:eol-style
   + native

Index: src/zope/pagetemplate/tests/util.py
===================================================================
--- src/zope/pagetemplate/tests/util.py	(revision 27362)
+++ src/zope/pagetemplate/tests/util.py	(working copy)
@@ -23,6 +23,7 @@
 class Bruce(object):
     __allow_access_to_unprotected_subobjects__=1
     def __str__(self): return 'bruce'
+    def __unicode__(self): return u'bruce'
     def __int__(self): return 42
     def __float__(self): return 42.0
     def keys(self): return ['bruce']*7
@@ -74,15 +75,25 @@
     for i in xrange(lo, hi):
         print '%s %s' % (tag, x[i]),
 
-def check_html(s1, s2):
+def check_html(s1, s2, use_diff=False):
     s1 = normalize_html(s1)
     s2 = normalize_html(s2)
-    assert s1==s2, (s1, s2, "HTML Output Changed")
+    if use_diff:
+        from difflib import unified_diff
+        diff = '\n'.join(unified_diff(s1.splitlines(), s2.splitlines()))
+        assert s1==s2, ("HTML Output Changed:\n%s" % diff)
+    else:
+        assert s1==s2, ("HTML Output Changed:\n%s\n\n%s" % (s1, s2))
 
 def check_xml(s1, s2):
     s1 = normalize_xml(s1)
     s2 = normalize_xml(s2)
-    assert s1==s2, ("XML Output Changed:\n%s\n\n%s" % (s1, s2))
+    if use_diff:
+        from difflib import unified_diff
+        diff = '\n'.join(unified_diff(s1.splitlines(), s2.splitlines()))
+        assert s1==s2, ("XML Output Changed:\n%s" % diff)
+    else:
+        assert s1==s2, ("XML Output Changed:\n%s\n\n%s" % (s1, s2))
 
 def normalize_html(s):
     s = re.sub(r"[ \t]+", " ", s)
Index: src/zope/pagetemplate/tests/test_htmltests.py
===================================================================
--- src/zope/pagetemplate/tests/test_htmltests.py	(revision 27362)
+++ src/zope/pagetemplate/tests/test_htmltests.py	(working copy)
@@ -68,6 +68,56 @@
         out = t(laf = self.folder.laf, getProducts = self.getProducts)
         util.check_html(expect, out)
 
+    def test_4(self):
+        # Check that sending encoded data will yield
+        # to encoded output instead of unicode output
+        self.folder.laf.write(util.read_input('teeshoplaf.html'))
+        data = util.read_input('nonascii.txt')
+        t = self.folder.t
+        t.write(util.read_input('teeshop3.html'))
+        expect = util.read_output('teeshop3.html')
+        data = unicode(data, 'latin-1').encode('utf-8')
+        out = t(laf = self.folder.laf, data = data)
+        expect = unicode(expect, 'latin-1').encode('utf-8')
+        util.check_html(expect, out)
+
+    def test_5(self):
+        # Check that sending unicode data will yield
+        # to unicode output in the same encoding
+        self.folder.laf.write(util.read_input('teeshoplaf.html'))
+        data = util.read_input('nonascii.txt')
+        t = self.folder.t
+        t.write(util.read_input('teeshop3.html'))
+        expect = util.read_output('teeshop3.html')
+        data = unicode(data, 'latin-1')
+        out = t(laf = self.folder.laf, data = data)
+        expect = unicode(expect, 'latin-1')
+        util.check_html(expect, out)
+
+    def test_6(self):
+        # Test for non-basestring data being sent.
+        # Should get plain string output, instead of unicode.
+        self.folder.laf.write(util.read_input('teeshoplaf.html'))
+        t = self.folder.t
+        t.write(util.read_input('teeshop4.html'))
+        expect = util.read_output('teeshop4.html')
+        data = 42
+        out = t(laf = self.folder.laf, data = data)
+        expect = expect
+        util.check_html(expect, out)
+
+    def test_7(self):
+        # Test for a object with __unicode__()
+        # Should get plain string output?
+        self.folder.laf.write(util.read_input('teeshoplaf.html'))
+        t = self.folder.t
+        t.write(util.read_input('teeshop5.html'))
+        expect = util.read_output('teeshop5.html')
+        data = util.bruce
+        out = t(laf = self.folder.laf, data = data)
+        expect = expect
+        util.check_html(expect, out)
+
     def test_SimpleLoop(self):
         t = self.folder.t
         t.write(util.read_input('loop1.html'))
Index: src/zope/tales/tales.py
===================================================================
--- src/zope/tales/tales.py	(revision 27362)
+++ src/zope/tales/tales.py	(working copy)
@@ -706,8 +706,13 @@
         text = self.evaluate(expr)
         if text is self.getDefault() or text is None:
             return text
-        return unicode(text)
+        if isinstance(text, basestring):
+            return text
+        return str(text)
 
+    def translate(self, msgid, domain=None, mapping=None, default=None):
+        return msgid
+
     def evaluateStructure(self, expr):
         return self.evaluate(expr)
     evaluateStructure = evaluate


More information about the Zope3-dev mailing list