[Zope3-dev] Re: zope.tal.xmlparser.XMLParser() dislikes unicode

Andreas Jung lists at zopyx.com
Mon Jan 15 09:01:50 EST 2007



--On 15. Januar 2007 14:52:42 +0100 Martijn Faassen 
<faassen at startifact.com> wrote:

> Hey,
>
> Gmane isn't updating so I can't really reply to the message (not visible
> in gmane) that I want to, but I saw the following solution proposed:
>
> def ourparse(text):
>     if isinstance(text, unicode):
>        text = text.encode('UTF-8')
>     xml_parser.parse(text)
>
> now consider what will happen if you do the following:
>
> text = u"<?xml version="1.0" encoding="ISO-8859-1" ?><foo>Some non-ascii
> characters here</foo>"
> ourparse(text)
>
> what will happen is that text is converted to a UTF-8 string (8-bit
> ascii). It's then passed to a hopefully compliant XML parser. This XML
> parser sees an 8-bit ascii string, and checks the encoding header for
> more information on the encoding of the string. It will therefore assume
> the string is in latin-1. The parse will break with an obscure error and
> the developer doing this is probably very confused.
>

ok, got it. But this problem can be solved easily by changing the encoding
within the preamble.

-aj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 186 bytes
Desc: not available
Url : http://mail.zope.org/pipermail/zope3-dev/attachments/20070115/f64c8f5b/attachment-0001.bin


More information about the Zope3-dev mailing list