[Zope3-dev] Re: zope.tal.xmlparser.XMLParser() dislikes unicode

Philipp von Weitershausen philipp at weitershausen.de
Sun Jan 14 13:40:35 EST 2007


On 14 Jan 2007, at 19:14 , Chris Withers wrote:
> Dieter Maurer wrote:
>> A halfway intelligent parser would accept Unicode when it gets it
>> and concentrate on the remaining part of its task: either reporting
>> structural events or building a parse tree.
>
> The trivial fix I use in Twiddler is as follows:
>
> if isinstance(source,unicode):
>   source = source.encode('utf-8')

It's the same fix I used.

> Of course, this assumes a heading of either <?xml version="1.0"  
> encoding="utf-8"?> or a missing encoding attribute, in which case  
> the xml spec states that the string must be utf-8 encoded.
>
> The problem comes when someone sends you something like:
>
> u'<?xml version="1.0" encoding="something-else"?><node />'
>
> What should be done then?

Not sure. We could ignore it or raise an error. I'm inclined to  
ignore it.





More information about the Zope3-dev mailing list