[Zope3-dev] Re: zope.tal.xmlparser.XMLParser() dislikes unicode

Martijn Faassen faassen at startifact.com
Mon Jan 15 08:52:42 EST 2007


Hey,

Gmane isn't updating so I can't really reply to the message (not visible 
in gmane) that I want to, but I saw the following solution proposed:

def ourparse(text):
    if isinstance(text, unicode):
       text = text.encode('UTF-8')
    xml_parser.parse(text)

now consider what will happen if you do the following:

text = u"<?xml version="1.0" encoding="ISO-8859-1" ?><foo>Some non-ascii 
characters here</foo>"
ourparse(text)

what will happen is that text is converted to a UTF-8 string (8-bit 
ascii). It's then passed to a hopefully compliant XML parser. This XML 
parser sees an 8-bit ascii string, and checks the encoding header for 
more information on the encoding of the string. It will therefore assume 
the string is in latin-1. The parse will break with an obscure error and 
the developer doing this is probably very confused.

This is why it's better to refuse to guess.

Regards,

Martijn



More information about the Zope3-dev mailing list