[Zope3-dev] Re: zope.tal.xmlparser.XMLParser() dislikes unicode

Andreas Jung lists at zopyx.com
Tue Jan 16 01:55:37 EST 2007



--On 15. Januar 2007 22:15:46 +0100 Martijn Faassen 
<faassen at startifact.com> wrote:
>
> My point is that:
>
> u"<?xml version="1.0" encoding="ISO-8859-1"?><foo>Some non-ascii
> text</foo>"
>
> is confusing at best. One part of this says it's a unicode string, the
> other part says it's in encoding latin-1.

The string above would be used for internal storage but *not* for 
processing. Btw. this is not different from storing HTML files as unicode 
string. An application must convert the unicode string back to a serialized
string - either to the encoding as specified inside the preamble or to a 
'general' encoding (that covers the unicode database) like utf-8 with 
changing the encoding inside the preamble - both are legitimate approaches.
There is no ambiguity. A smart XML parser will represent a XML document
*independent* of the source encoding in most general way (storing a textual
content a unicode (or utf-8 at least).

>> I still don't see what should ambiguous with this approach.
>
> Ambiguous in that the string seems to say it's in two encodings at once.
> You're then "guessing": you're letting the Python string type trump the
> declaration. Then, since we've shown that leads to bugs, you propose
> actually change the encoding declaration of the XML document. I wonder
> what people then expect to happen upon serialization. In effect, your
> proposal would, I think, serialize to UTF-8 only, right? (in which case
> the encoding declaration can be dropped as it's the default.

When you download a ZPT through FTP/WebDAV then the unicode representation
of the XML will be converted using the 'output_encoding' property of the
corresponding ZPT which is set when uploading a new XML document (and taken
from the premable). So when you upload an latin1 XML file you should get it 
back as valid latin1 through FTP/WebDAV.

When you download text/xml content through the ZPublisher then the 
ZPublisher will convert unicode textual content to some encoding which is
either taken from an already set 'content-type: text/...; charset=XXXXX'
HTTP Header or as fallback from the zpublisher-default-encoding property
as defined in the zope.conf file.

So the application can specify in both case the encoding of the serialized
XML content. Where is the problem?

Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 186 bytes
Desc: not available
Url : http://mail.zope.org/pipermail/zope3-dev/attachments/20070116/e3f590ad/attachment.bin


More information about the Zope3-dev mailing list