[Zope3-dev] Re: zope.tal.xmlparser.XMLParser() dislikes unicode

Andreas Jung lists at zopyx.com
Sun Jan 14 13:38:14 EST 2007



--On 14. Januar 2007 18:14:45 +0000 Chris Withers <chris at simplistix.co.uk> 
wrote:

> Dieter Maurer wrote:
>> A halfway intelligent parser would accept Unicode when it gets it
>> and concentrate on the remaining part of its task: either reporting
>> structural events or building a parse tree.
>
> The trivial fix I use in Twiddler is as follows:
>
> if isinstance(source,unicode):
>    source = source.encode('utf-8')
>
> Of course, this assumes a heading of either <?xml version="1.0"
> encoding="utf-8"?> or a missing encoding attribute, in which case the xml
> spec states that the string must be utf-8 encoded.

The encoding of the XML preamble should not matter when parsing a XML
document stored as unicode string. It is of importance as soon as you 
convert the document back to a stream e.g. when we deliver the content
back to a browser or a FTP client. The ZPublisher (for Zope 2) deals with 
that by changing the encoding parameter of the preamble for XML documents 
based on the desired output encoding. utf-8 is always a good choice however
other encodings like iso-8859-15 might raise UnicodeDecodeErrors. The Zope 2
publisher "avoids" this problem converting the unicode result using 
errors='replace' (which is likely something we might discuss :-))

Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 186 bytes
Desc: not available
Url : http://mail.zope.org/pipermail/zope3-dev/attachments/20070114/eb7743b7/attachment.bin


More information about the Zope3-dev mailing list