[Zope3-dev] Re: zope.tal.xmlparser.XMLParser() dislikes unicode

Philipp von Weitershausen philipp at weitershausen.de
Sun Jan 14 13:38:00 EST 2007


On 14 Jan 2007, at 18:37 , Dieter Maurer wrote:
> Philipp von Weitershausen wrote at 2007-1-14 14:59 +0100:
>> ...
>> Traditionally, you parse an 8bit string, figure out its encoding  
>> (e.g.
>> from <?xml encoding="utf-8"?> and return some representation of  
>> that XML
>> with unicode data. That's why it's actually quite ok for XML  
>> parsers to
>> only accept string data.
>
> Parsing usually means rebuilding the structure from a text string  
> and *NOT*
> encoding guessing or Unicode decoding.
>
> Therefore, it is actually quite stupid for a parser
> to try to encode an already decoded string (i.e. a Unicode string)
> only that it can guess the encoding ;-)
> A halfway intelligent parser would accept Unicode when it gets it
> and concentrate on the remaining part of its task: either reporting
> structural events or building a parse tree.

Yes, I agree. Unfortunately, expat isn't smart enough, which caused  
this whole discussion.



More information about the Zope3-dev mailing list