[ZCM] [ZC] 1474/ 2 Comment "PageTemplateFile opens XML files in binary mode"

Collector: Zope Bugs, Features, and Patches ... zope-coders-admin at zope.org
Tue Oct 5 12:34:26 EDT 2004


Issue #1474 Update (Comment) "PageTemplateFile opens XML files in binary mode"
 Status Pending, Zope/bug medium
To followup, visit:
  http://collector.zope.org/Zope/1474

==============================================================
= Comment - Entry #2 by yuppie on Oct 5, 2004 12:34 pm

Fred Drake wrote:
> This report isn't clear.  Please update the issue and explain what the
> problem is; glancing at the code on the Zope 2 and Zope 3 trunks, the
> only thing that looks suspicious to me is that re-opening an HTML file
> doesn't use Python's universal newline support.
> 
> HTML is always text, so should be treated that way on input.  XML may
> contain textual content, but should always be handed to the XML parser
> as a raw byte stream to allow the proper decoding machinery a shot at
> doing the right thing.

I try to restate the issue:

This is a problem in CMFSetup. CMFSetup creates XML using PageTemplateFiles. These files are checked in to CVS in text mode. So depending on the platform, they contain different newlines. If opened as text file, these newlines are normalized to LF. But opened as binary files, newlines are not normalized. Normalizing could be done at a later point, but that's not the case. So line breaks are not normalized before parsing, but the parser expects LF newlines.

Removing newlines, the parser removes only LF, leaving in the CR. Adding newlines, the parser adds LF. Existing newlines are preserved as CR/LF. So the returned XML contains all 3 kinds of newlines.

This is what the XML 1.0 spec says:

"""2.11 End-of-Line Handling

XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters CARRIAGE RETURN (#xD) and LINE FEED (#xA).

To simplify the tasks of applications, the XML processor MUST behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character."""
________________________________________
= Request - Entry #1 by yuppie on Aug 19, 2004 11:49 am

This is a problem on Windows. If I read the specs ( http://www.w3.org/TR/2004/REC-xml-20040204/#sec-line-ends ) correctly, Windows newlines are allowed within XML. But PageTemplateFile opens them in binary mode, ignoring the fact the file might contain CRs. As a result, the parsed files contain a mix of CR/LF, LF and even CR newlines.

Is there any good reason why this was fixed for HTML, but not for XML files?
==============================================================



More information about the Zope-Collector-Monitor mailing list