[Zope3-dev] string agnostic page templates, again

Wed Sep 8 06:12:08 EDT 2004

Fred Drake wrote:
> On Tue, 07 Sep 2004 19:23:10 +0200, Martijn Faassen <faassen at infrae.com> wrote:
> 
>>Right; Five already exposes Zope 3's version to Zope 2, hopefully a step
>>in the right direction. :)
> 
> Once we have all the issues here solved, that's probably just fine.  I
> hope so, since I hope to make the TAL and Products.PageTemplates
> packages thin API shims over zope.tal, zope.tales, and
> zope.(app.?)pagetemplate eventually.
> 
>>What if the first thing on the stream is latin-1 and then unicode gets
>>added?
>  
> Then you're hosed if you get an 8-bit character in the Latin-1, but
> that seems to be the case now.  How should the TAL interpreter know
> whether you're using Latin-1 or UTF-8?  I don't see a good way to deal
> with this one without knowing what each bit of text is.  And the only
> way to be sure of that is to use Unicode.

Right, there's indeed no good to deal with it so we shouldn't. The 
system should however know it's getting a non-plain-ascii classic string 
if we want to improve the error reporting. non-ascii and unicode don't 
mix. ascii mixes with both.

>>What if the first thing on the stream is a non-string object?
>  
> Then it has to buffer; it can't decide what to use without some clue.

I hadn't considered that strategy yet. It may of course theoretically 
lead to surprising results, as a later call may actually alter the first 
result retroactively. I think that this is mostly theoretical, however.

>>It sounds like a good approach, just slightly worried about edge cases
>>like the ones above.
> 
> We have to figure out which edge cases can be handled magically.  I
> don't think the Latin-1 vs. Latin-5 vs. UTF-8 case is solvable using
> magic.

Agreed.

>>It should break horribly as soon as possible (presumably earlier than in
>>.getvalue() as is happening now) as soon as encoded (non-ascii) text is
>>mixed with unicode.
>  
> Once Unicode is seen, it can immediately switch to Unicode for
> everything.  If we never get Unicode, we avoid the conversion
> entirely.  Otherwise, we need the magic and can't generate the error
> early.  So we get the error as soon as we know the target type.

It'd be nice to get an error as soon as > ord(127) text has been seen 
and then unicode gets added. I don't know whether there's a reasonably 
fast way to check for > ord(127)-ness though.

It'd also be nice to get an error as soon as > ord(127) text gets added 
and we've already seen unicode.

Unless there are strong reasons from the perspective of efficiency I'd 
like to see such immediate errors, see below.

>>Going through some sequences, in semi-regex patterns:
>  
> Are these some sort of behavioral expectations?  Or was there an
> implied question here?

They're behavioral expectations. I'd like to see immediate errors being 
raised in the engine *as soon as* something unsolvable is going on, 
instead of what currently is the case, where you only find out in the 
end of the engine where all the strings get added together. Getting an 
immediate error should make things easier to debug, as you can hopefully 
better see where you went wrong.

Regards,

Martijn