[Zope3-dev] Re: zope.tal.xmlparser.XMLParser() dislikes unicode

Martijn Faassen faassen at startifact.com
Tue Jan 16 17:19:15 EST 2007


Dieter Maurer wrote:
> Martijn Faassen wrote at 2007-1-15 15:44 +0100:
>> ....
>> Hey,
>>
>> On 1/15/07, Andreas Jung <lists at zopyx.com> wrote:
>> [snip]
>>> ok, got it. But this problem can be solved easily by changing the encoding
>>> within the preamble.
>> I would say refusing to guess and bailing out with an error message is
>> better in this case.
> 
> I disagree with you.
> 
>   Logically, parsing an encoded XML document consists of two
>   passes: decode the encoded string into unicode and reconstruct
>   the XML info elements from the serialization.
> 
>   Traditionally, these two passes are not performed one after
>   the other but folded together in a single pass.
>   
>   But that tradition should not prevent to separate out the
>   (Unicode) decoding phase. And after this phase is done,
>   there is not ambiguity left with the "XML declaration".
>   Its encoding attribute is simply irrelevant for the second phase
>   (apart from generating the PI info element).

That's nice as far as it goes. What if after the second phase you need 
to parse the XML again? What do you do with your encoding header then? 
If it's irrelevant, you better strip it out before you put it into the 
parser.

Regards,

Martijn



More information about the Zope3-dev mailing list