[Zope-CMF] Unicode for ReST?

Charlie Clark charlie.clark at clark-consulting.eu
Mon Apr 26 16:05:42 EDT 2010


Am 26.04.2010, 11:24 Uhr, schrieb yuppie <y.2010 at wcm-solutions.de>:

> Actually *all* strings passed to PageTemplates should be decoded, no
> matter which browser you use. That's the only sane way to mix encoded
> strings with unicode strings.

'Tis true but those are still most pathetic as it means they get offered  
Latin-1

>> I looked a bit into the system and saw that we still use ReST in a very
>> Wallace&  Gromit way: ReST encodes the generated HTML using the default
>> encoding from zope.conf and we promptly decode it back to unicode every
>> time we want to display it, and make sure default-encoding and
>> rest-encoding match. Adding "output='unicode' to Document's CookedBody()
>> removes the double-encoding and doesn't break any tests. Would it be  
>> okay
>> to add this for Document and News objects and adjust the views
>> accordingly?
> Not sure I understand what you propose. Would that mean calling
> CookedBody(output='unicode') converts the persistent cooked_text to
> unicode and calling CookedBody() converts it back?

Sorry, very poor explanation of me - the underlying conversion from ReST  
to HTML can accept an output_encoding:


def HTML(src,
          writer='html4css1',
          report_level=1,
          stylesheet=None,
          input_encoding=default_input_encoding,
          output_encoding=default_output_encoding,
          language_code=default_language_code,
          initial_header_level = initial_header_level,
          warnings = None,
          settings = {}):

And later on:

     if output_encoding != 'unicode':
         return output.encode(output_encoding)
     else:
         return output

So, really quite braindead not add the output_encoding='unicode' to the  
ReST-call in Document.py

> CookedBody() is meant to *get* the cooked body. It only updates
> cooked_text if you use a new STX or ReST level. (BTW a nasty  
> write-on-read.)

Yes, probably more important to fix these warts.

> _edit() normally *sets* cooked_text.
> On interface level, I think we can explicitly allow CookedBody() to
> return encoded strings *or* unicode. I'd prefer that strategy over
> adding an 'output' argument to all get methods.
> On implementation level, content types shipped with CMF could always set
> cooked_text as unicode.

> The most work would be to write an upgrade step (including tests) that
> works reliable. So far we don't have any upgrade steps that update
> content items.

Okay. We'll see how it goes.

Charlie
-- 
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-600-3657
Mobile: +49-178-782-6226


More information about the Zope-CMF mailing list