[Zope3-dev] zope.tal.xmlparser.XMLParser() dislikes unicode

Andreas Jung lists at zopyx.com
Sat Jan 13 12:49:38 EST 2007


Hi,

the XMLParser.parseString() method  raises an exception

  File "/opt/python-2.4.4/lib/python2.4/unittest.py", line 260, in run
    testMethod()
  File 
"/Users/ajung_data/sandboxes/Zope/Zope/lib/python/zope/tal/tests/test_xmlparser.py", 
line 127, in test_xx
    self._run_check(xml, ())
  File 
"/Users/ajung_data/sandboxes/Zope/Zope/lib/python/zope/tal/tests/test_xmlparser.py", 
line 106, in _run_check
    parser.parseString(source)
  File 
"/Users/ajung_data/sandboxes/Zope/Zope/lib/python/zope/tal/xmlparser.py", 
line 77, in parseString
    self.parser.Parse(s, 1)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 
43-48: ordinal not in range(128)

if the string to be parsed is a unicode strings and contains some non-ascii
chars. The following snippet from a private unittest (test_xmlparsers.py)
shows the error.

    def test_xx(self):
        xml = unicode('<?xml version="1.0" 
encoding="utf-8"?><foo>üöä</foo>', 'iso-8859-15')
        self._run_check(xml, ())

I am not sure if this behavior is intentional?! Is the XMLParser supposed
to deal with unicode strings or will it only accept a standard Python 
string? A workaround inside parseString() would to check for unicode
and convert the string on-the-fly to a Python string with utf-8 encoding.
This is possibly a limitation of the underlying Expat parser...any 
recommendation how to deal with this issue?

Andras




-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 186 bytes
Desc: not available
Url : http://mail.zope.org/pipermail/zope3-dev/attachments/20070113/7830e6d1/attachment.bin


More information about the Zope3-dev mailing list