[Zope3-dev] Re: Input encoding of a PageTemplateFile

Thu Aug 4 09:27:58 EDT 2005

On 7/22/05, Dmitry Vasiliev <lists at hlabs.spb.ru> wrote:
> I think about the following generic algorithm:
> 
> 1. Preparation stage. Content type and encoding are determining based on the
> <?xml?>/<meta> declarations. In case of the 'text/html' type and a not unicoded
> content we decode the content. In case of the 'text/xml' type the parser takes
> care of the encoding at the cooking stage. We can do it somewhere inside
> PageTemplate.pt_edit()/PageTemplate.write() methods.

This is probably right; I'll have to look at the code again.

> 2. Cooking stage. Nothing interested for our case.

Wrong; this is when the "bytecode" is generated.  At this point, we
can remove the encoding markers (since we've already used them for
input).

> 3. Rendering stage. Now we can strip the <?xml?>/<meta> declarations. We can do
> it somewhere inside PageTemplate.pt_render()/PageTempalte.__call__() methods.

Rendering is the most costly stage, so we want to reduce the work done
here.  Avoiding it entirely is best.  By removing the encoding markers
at compilation time, we manage to have nothing else to do at this
stage.

> BTW, just curious why we need to read HTML files in the text mode (See
> PageTemplateFile._read_file())?

I don't remember, but it seemed important at the time.  It likely has
something to do with newline normalization; the XML parser handles
that for us since the XML specification requires it to, but the HTML
parser doesn't bother.

I doubt this is important in practice, but may be relied on in the tests.

  -Fred

-- 
Fred L. Drake, Jr.    <fdrake at gmail.com>
Zope Corporation