[Zope3-dev] Unicode handling in Zope3 Page Templates
Sidnei da Silva
sidnei at awkly.org
Tue Aug 31 09:58:58 EDT 2004
On Tue, Aug 31, 2004 at 11:08:31AM +0200, Martijn Faassen wrote:
| >>If I use sys.setdefaultencoding('utf-8') in sitecustomize.py,
| >
| >I consider this evil. I know Guido would be happy if this wasn't in
| >the language. This is just not an option.
|
| I think Sidnei shouldn't even have brought this up; it should have
| absolutely nothing to do with the rest of the discussion and what Sidnei
| actually did in the end.
Yes, sorry for that. I shouldn't have even mentioned it.
| Sidnei, please please get rid of this
| sys.setdefaultencoding() thing and forget it ever existed -- it's
| absolutely impossible to support code that even faintly looks like it
| may depend on it.
I did get rid of it.
| This code isn't depending on setdefaultencoding() though, as far as I
| can see.
That's correct. Nothing depends on setdefaultencoding().
| >Probably not. Certainly not in it's current form.
|
| What is wrong with the current form, Jim? What it tries to do is make
| the page template engine 'unicode agnostic', instead of 'unicode only'.
| Right now, the page template engine in Zope 3 (unlike the one in Zope
| 2), only works if you feed in unicode strings only. This is fine in Zope
| 3, but if you want to use it outside of Zope 3, this may not be what you
| want.
|
| So, Sidnei attempted to change it so that it'll work if you put in
| normal (encoded) strings only, or if you put in unicode strings only.
| Combinations will still fail miserably (this hasn't changed). The
| failure will even happen in the same place -- getvalue() in StringIO at
| the end. The only type of thing that can be combined safely with both is
| plain ascii strings; i.e. it relies on the unchanged default encoding of
| the system.
|
| I can see something wrong with the following hack Sidnei employed in a
| few places, where he replaced
|
| unicode(text)
|
| with
|
| isinstance(text, basestring) and text or unicode(text)
Yes, that's certainly wrong now that I look at it. I think that's what
Jim was referring to?
| ..
|
| plain unicode(text), like str(text), doesn't typically work in unicode
| agnostic code.
|
| Trying to reconstruct the logic in more readable form (which is
| difficult, indicating that this code shouldn't be employed :), Sidnei's
| code looks like this, I think..:
Did you mean 'deployed' above? :)
| if isinstance(text, basestring):
| result = text
| else:
| result = unicode(text)
|
| this is the wrong thing if text is in fact not a basestring, but, say, a
| number, which I suspect is something that can happen, even though the
| thing is misleadingly called 'text' -- it's why the 'unicode()' is there
| in the first place. In this case, the string representation of the
| number will be in unicode, which will be wrong if you're running in
| pure-encoded mode. Sidnei, you need to include a unit test where the
| data that enters the page template is not a string; I think it will
| fail. Also include a few tests where the data is actually 0 while you're
| at it, if you are a fan of shortcuts. :)
Ok, added a test with numbers.
| Anyway, what would work better is the following:
|
| if ininstance(text, basestring):
| result = text
| else:
| result = str(text)
Changed to use this form.
| As long as str(text) == str(unicode(text)) is True (and doesn't fail
| with a unicode error), this will at least work correctly in both unicode
| mode (as it can deal with plain-ascii) as well as encoded mode (as it
| can deal with plain ascii).
|
| For built-ins outside unicode strings, str(text) == str(unicode(text)) I
| think always applies. The problem remains with other objects which are
| not built-ins which may want to return unicode strings; i.e. custom
| objects which define __unicode__(). Perhaps i18n-ed strings? -- that's
| another good candidate for a test.
Not sure what's going to happen in this case. I think str() doesn't
even look at __unicode__()? Then we probably need your solution below.
I've added a test using i18n:translate, and then found that the base
'Context' object doesn't have a translate() method, even though the
page template will accept and execute i18n:translate commands. I've
added one that just returns the msgid. It should probably return a
unicode string there? In which case we are hosed if i18n:translate
tags are used in the template, as that's guaranteed to return unicode
strings AFAICS.
| If we *do* need unicode(text) to work safely, we'll need to refactor the
| ZPT code so it can actually run in 'encoded mode' as well in 'unicode
| mode'. Then any cases where we see 'unicode(text)' (not many, mind),
| need to be replaced with something like:
|
| if encoded_mode:
| result = str(text)
| else:
| result = unicode(text)
|
| > The problem is that you can't really predict what the
| > encoding will be in Zope 2. IMO, it is better not to guess.
|
| That's not what Sidnei's code is trying to do. I suggested to him to try
| to make it unicode-agnostic. :)
Exactly.
| > If you did guess, you'd probably want to guess latin 1.
|
| That would fail miserably in very common Zope 2 systems, like Silva or
| Plone. :)
Yup. I've tried it *wink*
| > I don't have any good ideas for a short-term hack. Maybe someone
| > else does.
|
| My best hack so far is what I proposed above. It's not that different
| from Sidnei's, though less buggy. :)
Ok, applied your suggestions. Here's the new patch.
--
Sidnei da Silva <sidnei at awkly.org>
http://awkly.org - dreamcatching :: making your dreams come true
http://www.enfoldsystems.com
http://plone.org/about/team#dreamcatcher
Remember, God could only create the world in 6 days because he didn't
have an established user base.
-------------- next part --------------
Index: src/zope/tal/talinterpreter.py
===================================================================
--- src/zope/tal/talinterpreter.py (revision 27362)
+++ src/zope/tal/talinterpreter.py (working copy)
@@ -562,7 +562,11 @@
if structure is self.Default:
self.interpret(block)
return
- text = unicode(structure)
+ if isinstance(structure, basestring):
+ text = structure
+ else:
+ text = str(structure)
+
if not (repldict or self.strictinsert):
# Take a shortcut, no error checking
self.stream_write(text)
Index: src/zope/app/pagetemplate/engine.py
===================================================================
--- src/zope/app/pagetemplate/engine.py (revision 27362)
+++ src/zope/app/pagetemplate/engine.py (working copy)
@@ -21,6 +21,8 @@
from zope.interface import implements
+from zope.hookable import hookable
+
from zope.tales.expressions import PathExpr, StringExpr, NotExpr, DeferExpr
from zope.tales.expressions import SimpleModuleImporter
from zope.tales.pythonexpr import PythonExpr
@@ -102,7 +104,7 @@
if isinstance(text, basestring):
# text could be a proxied/wrapped object
return text
- return unicode(text)
+ return str(text)
def evaluateMacro(self, expr):
macro = Context.evaluateMacro(self, expr)
@@ -403,7 +405,6 @@
def pt_getEngine(self):
return Engine
-
class TrustedAppPT(object):
def pt_getEngine(self):
Index: src/zope/pagetemplate/pagetemplate.py
===================================================================
--- src/zope/pagetemplate/pagetemplate.py (revision 27362)
+++ src/zope/pagetemplate/pagetemplate.py (working copy)
@@ -112,7 +112,7 @@
if self._v_errors:
raise PTRuntimeError(str(self._v_errors))
- output = StringIO(u'')
+ output = StringIO()
context = self.pt_getEngineContext(namespace)
TALInterpreter(self._v_program, self._v_macros,
context, output, tal=not source, strictinsert=0)()
Index: src/zope/pagetemplate/tests/input/nonascii.txt
===================================================================
--- src/zope/pagetemplate/tests/input/nonascii.txt (revision 0)
+++ src/zope/pagetemplate/tests/input/nonascii.txt (revision 0)
@@ -0,0 +1,2 @@
+In every census between 1960 and 2000, rural counties have constituted
+95 percent of those labeled ?persistently poor.?
Property changes on: src/zope/pagetemplate/tests/input/nonascii.txt
___________________________________________________________________
Name: svn:eol-style
+ native
Index: src/zope/pagetemplate/tests/input/teeshop3.html
===================================================================
--- src/zope/pagetemplate/tests/input/teeshop3.html (revision 0)
+++ src/zope/pagetemplate/tests/input/teeshop3.html (revision 0)
@@ -0,0 +1,6 @@
+<html metal:use-macro="options/laf/macros/page">
+<div metal:fill-slot="body">
+<tal:block replace="options/data" />
+<tal:block replace="structure options/data" />
+</div>
+</html>
Property changes on: src/zope/pagetemplate/tests/input/teeshop3.html
___________________________________________________________________
Name: svn:eol-style
+ native
Index: src/zope/pagetemplate/tests/input/teeshop4.html
===================================================================
--- src/zope/pagetemplate/tests/input/teeshop4.html (revision 0)
+++ src/zope/pagetemplate/tests/input/teeshop4.html (revision 0)
@@ -0,0 +1,9 @@
+<html metal:use-macro="options/laf/macros/page">
+<div metal:fill-slot="body">
+<tal:block replace="options/data" />
+<tal:block replace="structure options/data" />
+<span tal:attributes="id options/data" />
+<span i18n:translate="" tal:content="options/data">SPAM</span>
+<span i18n:attributes="name" tal:attributes="name options/data" />
+</div>
+</html>
Property changes on: src/zope/pagetemplate/tests/input/teeshop4.html
___________________________________________________________________
Name: svn:eol-style
+ native
Index: src/zope/pagetemplate/tests/input/teeshop5.html
===================================================================
--- src/zope/pagetemplate/tests/input/teeshop5.html (revision 0)
+++ src/zope/pagetemplate/tests/input/teeshop5.html (revision 0)
@@ -0,0 +1,9 @@
+<html metal:use-macro="options/laf/macros/page">
+<div metal:fill-slot="body">
+<tal:block replace="options/data" />
+<tal:block replace="structure options/data" />
+<span tal:attributes="id options/data" />
+<span i18n:translate="" tal:content="options/data">SPAM</span>
+<span i18n:attributes="name" tal:attributes="name options/data" />
+</div>
+</html>
Property changes on: src/zope/pagetemplate/tests/input/teeshop5.html
___________________________________________________________________
Name: svn:eol-style
+ native
Index: src/zope/pagetemplate/tests/output/teeshop3.html
===================================================================
--- src/zope/pagetemplate/tests/output/teeshop3.html (revision 0)
+++ src/zope/pagetemplate/tests/output/teeshop3.html (revision 0)
@@ -0,0 +1,55 @@
+<html>
+<head>
+<title>Zope Stuff</title>
+<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
+<link rel="stylesheet" href="/common.css">
+</head>
+
+<body bgcolor="#FFFFFF" leftmargin="0" topmargin="0" marginwidth="0" marginheight="0">
+<table width="100%" border="0" cellspacing="0" cellpadding="0">
+ <tr bgcolor="#0000CC" align="center">
+ <td>
+ <table width="200" border="0" cellspacing="0" cellpadding="0">
+ <tr bgcolor="#FFFFFF">
+ <td><img src="/images/lside.gif" width="52" height="94"><img src="/images/swlogo.gif" width="150" height="89"><img src="/images/rside.gif" width="52" height="94"></td>
+ </tr>
+ </table>
+ </td>
+ </tr>
+</table>
+<br>
+<table width="300" border="0" cellspacing="0" cellpadding="0" align="center">
+ <tr align="center">
+ <td width="25%" class="boldbodylist">apparel</td>
+ <td width="25%" class="boldbodylist">mugs</td>
+ <td width="25%" class="boldbodylist">toys</td>
+ <td width="25%" class="boldbodylist">misc</td>
+ </tr>
+</table>
+<br>
+<br>
+<div>
+In every census between 1960 and 2000, rural counties have constituted
+95 percent of those labeled ?persistently poor.?
+
+In every census between 1960 and 2000, rural counties have constituted
+95 percent of those labeled ?persistently poor.?
+
+</div>
+<br><br>
+<table width="100%" border="0" cellspacing="1" cellpadding="3" align="center">
+ <tr>
+ <td align="center" bgcolor="#FFFFFF" class="bodylist">
+ Copyright © 2000
+ <a href="http://www.4-am.com">4AM Productions, Inc.</a>.
+ All rights reserved. <br>
+ Questions or problems should be directed to
+ <a href="mailto:webmaster at teamzonline.com">the webmaster</a>,
+ 254-412-0846.</td>
+ </tr>
+ <tr>
+ <td align="center"><img src="/images/zopelogos/buildzope.gif" width="54" height="54"></td>
+ </tr>
+</table>
+</body>
+</html>
Property changes on: src/zope/pagetemplate/tests/output/teeshop3.html
___________________________________________________________________
Name: svn:eol-style
+ native
Index: src/zope/pagetemplate/tests/output/teeshop4.html
===================================================================
--- src/zope/pagetemplate/tests/output/teeshop4.html (revision 0)
+++ src/zope/pagetemplate/tests/output/teeshop4.html (revision 0)
@@ -0,0 +1,54 @@
+<html>
+<head>
+<title>Zope Stuff</title>
+<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
+<link rel="stylesheet" href="/common.css">
+</head>
+
+<body bgcolor="#FFFFFF" leftmargin="0" topmargin="0" marginwidth="0" marginheight="0">
+<table width="100%" border="0" cellspacing="0" cellpadding="0">
+ <tr bgcolor="#0000CC" align="center">
+ <td>
+ <table width="200" border="0" cellspacing="0" cellpadding="0">
+ <tr bgcolor="#FFFFFF">
+ <td><img src="/images/lside.gif" width="52" height="94"><img src="/images/swlogo.gif" width="150" height="89"><img src="/images/rside.gif" width="52" height="94"></td>
+ </tr>
+ </table>
+ </td>
+ </tr>
+</table>
+<br>
+<table width="300" border="0" cellspacing="0" cellpadding="0" align="center">
+ <tr align="center">
+ <td width="25%" class="boldbodylist">apparel</td>
+ <td width="25%" class="boldbodylist">mugs</td>
+ <td width="25%" class="boldbodylist">toys</td>
+ <td width="25%" class="boldbodylist">misc</td>
+ </tr>
+</table>
+<br>
+<br>
+<div>
+42
+42
+<span id="42" />
+<span>42</span>
+<span name="42" />
+</div>
+<br><br>
+<table width="100%" border="0" cellspacing="1" cellpadding="3" align="center">
+ <tr>
+ <td align="center" bgcolor="#FFFFFF" class="bodylist">
+ Copyright © 2000
+ <a href="http://www.4-am.com">4AM Productions, Inc.</a>.
+ All rights reserved. <br>
+ Questions or problems should be directed to
+ <a href="mailto:webmaster at teamzonline.com">the webmaster</a>,
+ 254-412-0846.</td>
+ </tr>
+ <tr>
+ <td align="center"><img src="/images/zopelogos/buildzope.gif" width="54" height="54"></td>
+ </tr>
+</table>
+</body>
+</html>
Property changes on: src/zope/pagetemplate/tests/output/teeshop4.html
___________________________________________________________________
Name: svn:eol-style
+ native
Index: src/zope/pagetemplate/tests/output/teeshop5.html
===================================================================
--- src/zope/pagetemplate/tests/output/teeshop5.html (revision 0)
+++ src/zope/pagetemplate/tests/output/teeshop5.html (revision 0)
@@ -0,0 +1,54 @@
+<html>
+<head>
+<title>Zope Stuff</title>
+<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
+<link rel="stylesheet" href="/common.css">
+</head>
+
+<body bgcolor="#FFFFFF" leftmargin="0" topmargin="0" marginwidth="0" marginheight="0">
+<table width="100%" border="0" cellspacing="0" cellpadding="0">
+ <tr bgcolor="#0000CC" align="center">
+ <td>
+ <table width="200" border="0" cellspacing="0" cellpadding="0">
+ <tr bgcolor="#FFFFFF">
+ <td><img src="/images/lside.gif" width="52" height="94"><img src="/images/swlogo.gif" width="150" height="89"><img src="/images/rside.gif" width="52" height="94"></td>
+ </tr>
+ </table>
+ </td>
+ </tr>
+</table>
+<br>
+<table width="300" border="0" cellspacing="0" cellpadding="0" align="center">
+ <tr align="center">
+ <td width="25%" class="boldbodylist">apparel</td>
+ <td width="25%" class="boldbodylist">mugs</td>
+ <td width="25%" class="boldbodylist">toys</td>
+ <td width="25%" class="boldbodylist">misc</td>
+ </tr>
+</table>
+<br>
+<br>
+<div>
+bruce
+bruce
+<span id="bruce" />
+<span>bruce</span>
+<span name="bruce" />
+</div>
+<br><br>
+<table width="100%" border="0" cellspacing="1" cellpadding="3" align="center">
+ <tr>
+ <td align="center" bgcolor="#FFFFFF" class="bodylist">
+ Copyright © 2000
+ <a href="http://www.4-am.com">4AM Productions, Inc.</a>.
+ All rights reserved. <br>
+ Questions or problems should be directed to
+ <a href="mailto:webmaster at teamzonline.com">the webmaster</a>,
+ 254-412-0846.</td>
+ </tr>
+ <tr>
+ <td align="center"><img src="/images/zopelogos/buildzope.gif" width="54" height="54"></td>
+ </tr>
+</table>
+</body>
+</html>
Property changes on: src/zope/pagetemplate/tests/output/teeshop5.html
___________________________________________________________________
Name: svn:eol-style
+ native
Index: src/zope/pagetemplate/tests/util.py
===================================================================
--- src/zope/pagetemplate/tests/util.py (revision 27362)
+++ src/zope/pagetemplate/tests/util.py (working copy)
@@ -23,6 +23,7 @@
class Bruce(object):
__allow_access_to_unprotected_subobjects__=1
def __str__(self): return 'bruce'
+ def __unicode__(self): return u'bruce'
def __int__(self): return 42
def __float__(self): return 42.0
def keys(self): return ['bruce']*7
@@ -74,15 +75,25 @@
for i in xrange(lo, hi):
print '%s %s' % (tag, x[i]),
-def check_html(s1, s2):
+def check_html(s1, s2, use_diff=False):
s1 = normalize_html(s1)
s2 = normalize_html(s2)
- assert s1==s2, (s1, s2, "HTML Output Changed")
+ if use_diff:
+ from difflib import unified_diff
+ diff = '\n'.join(unified_diff(s1.splitlines(), s2.splitlines()))
+ assert s1==s2, ("HTML Output Changed:\n%s" % diff)
+ else:
+ assert s1==s2, ("HTML Output Changed:\n%s\n\n%s" % (s1, s2))
def check_xml(s1, s2):
s1 = normalize_xml(s1)
s2 = normalize_xml(s2)
- assert s1==s2, ("XML Output Changed:\n%s\n\n%s" % (s1, s2))
+ if use_diff:
+ from difflib import unified_diff
+ diff = '\n'.join(unified_diff(s1.splitlines(), s2.splitlines()))
+ assert s1==s2, ("XML Output Changed:\n%s" % diff)
+ else:
+ assert s1==s2, ("XML Output Changed:\n%s\n\n%s" % (s1, s2))
def normalize_html(s):
s = re.sub(r"[ \t]+", " ", s)
Index: src/zope/pagetemplate/tests/test_htmltests.py
===================================================================
--- src/zope/pagetemplate/tests/test_htmltests.py (revision 27362)
+++ src/zope/pagetemplate/tests/test_htmltests.py (working copy)
@@ -68,6 +68,56 @@
out = t(laf = self.folder.laf, getProducts = self.getProducts)
util.check_html(expect, out)
+ def test_4(self):
+ # Check that sending encoded data will yield
+ # to encoded output instead of unicode output
+ self.folder.laf.write(util.read_input('teeshoplaf.html'))
+ data = util.read_input('nonascii.txt')
+ t = self.folder.t
+ t.write(util.read_input('teeshop3.html'))
+ expect = util.read_output('teeshop3.html')
+ data = unicode(data, 'latin-1').encode('utf-8')
+ out = t(laf = self.folder.laf, data = data)
+ expect = unicode(expect, 'latin-1').encode('utf-8')
+ util.check_html(expect, out)
+
+ def test_5(self):
+ # Check that sending unicode data will yield
+ # to unicode output in the same encoding
+ self.folder.laf.write(util.read_input('teeshoplaf.html'))
+ data = util.read_input('nonascii.txt')
+ t = self.folder.t
+ t.write(util.read_input('teeshop3.html'))
+ expect = util.read_output('teeshop3.html')
+ data = unicode(data, 'latin-1')
+ out = t(laf = self.folder.laf, data = data)
+ expect = unicode(expect, 'latin-1')
+ util.check_html(expect, out)
+
+ def test_6(self):
+ # Test for non-basestring data being sent.
+ # Should get plain string output, instead of unicode.
+ self.folder.laf.write(util.read_input('teeshoplaf.html'))
+ t = self.folder.t
+ t.write(util.read_input('teeshop4.html'))
+ expect = util.read_output('teeshop4.html')
+ data = 42
+ out = t(laf = self.folder.laf, data = data)
+ expect = expect
+ util.check_html(expect, out)
+
+ def test_7(self):
+ # Test for a object with __unicode__()
+ # Should get plain string output?
+ self.folder.laf.write(util.read_input('teeshoplaf.html'))
+ t = self.folder.t
+ t.write(util.read_input('teeshop5.html'))
+ expect = util.read_output('teeshop5.html')
+ data = util.bruce
+ out = t(laf = self.folder.laf, data = data)
+ expect = expect
+ util.check_html(expect, out)
+
def test_SimpleLoop(self):
t = self.folder.t
t.write(util.read_input('loop1.html'))
Index: src/zope/tales/tales.py
===================================================================
--- src/zope/tales/tales.py (revision 27362)
+++ src/zope/tales/tales.py (working copy)
@@ -706,8 +706,13 @@
text = self.evaluate(expr)
if text is self.getDefault() or text is None:
return text
- return unicode(text)
+ if isinstance(text, basestring):
+ return text
+ return str(text)
+ def translate(self, msgid, domain=None, mapping=None, default=None):
+ return msgid
+
def evaluateStructure(self, expr):
return self.evaluate(expr)
evaluateStructure = evaluate
More information about the Zope3-dev
mailing list