[Zope3-dev] Re: zope.tal.xmlparser.XMLParser() dislikes unicode

Dieter Maurer dieter at handshake.de
Sun Jan 14 12:37:31 EST 2007


Philipp von Weitershausen wrote at 2007-1-14 14:59 +0100:
> ...
>Traditionally, you parse an 8bit string, figure out its encoding (e.g. 
>from <?xml encoding="utf-8"?> and return some representation of that XML 
>with unicode data. That's why it's actually quite ok for XML parsers to 
>only accept string data.

Parsing usually means rebuilding the structure from a text string and *NOT*
encoding guessing or Unicode decoding.

Therefore, it is actually quite stupid for a parser
to try to encode an already decoded string (i.e. a Unicode string)
only that it can guess the encoding ;-)
A halfway intelligent parser would accept Unicode when it gets it
and concentrate on the remaining part of its task: either reporting
structural events or building a parse tree.



-- 
Dieter


More information about the Zope3-dev mailing list