[Zope-CMF] Re: [RFC] [Patch] GenericSetup and encodings

Florent Guillaume fg at nuxeo.com
Wed Jun 7 08:55:33 EDT 2006


yuppie wrote:
> Hi Yves!
> 
> 
> Yves Bastide wrote:
>> GenericSetup has problems handling non-ASCII data.
> 
> 1.) GenericSetup explicitly doesn't support non-UTF-8 XML in profiles. 
> UTF-8 is the default encoding for XML and I can't see a need to support 
> other XML encodings.
> 
> 2.) GenericSetup explicitly doesn't support non-UTF-8 site settings. If 
> someone provides a good patch this feature can be added.
> 
> 3.) GenericSetup is not tested with non-ASCII UTF-8 site settings. AFAIK 
> import works, but not export. I consider this a bug.
> 
>> It treats strings sometimes as ASCII, sometimes as UTF-8, yet it has 
>> access to two variables: its own ISetupContext.getEncoding() (whose 
>> use I didn't fully grok) and CMF's 
>> ISetupContext.getSite().getProperty('default_charset').
> 
> Sorry, but your assumptions are wrong:
> 
> - The default setup tool creates export contexts without specifying the 
> encoding, so ISetupContext.getEncoding() returns always None. And even 
> if it would be set it represents the encoding of the exported files, not 
> the site encoding.
> 
> - getSite().getProperty('default_charset') is CMF specific and should 
> not be used in GenericSetup.
> 
> - The adapters adapt ISetupEnviron, not ISetupContext. getEncoding() and 
> getSite() are not always available.
> 
>> Attached is a patch using both of them and somewhat working in my 
>> setup. Can knowledgeable people comment on it before I enter a 
>> collector issue? (I'm using GS alongside with CPS, which also needs 
>> some patching; yet basic things, such as exporting-importing an 
>> iso8859-15 Title in a CMF charset-default'ed to iso8859-15, should work)
> 
> First of all we need unit tests that make sure UTF-8 works and I think 
> this should be the default used by GenericSetup. Code that needs to know 
> how to find the site encoding can't be generic.
> 
> There is an additional problem: If tools use the default property edit 
> page from OFS the properties might have a different encoding than 
> 'default_charset' of the site. Since the default 
> 'management_page_charset' is UTF-8 we have less trouble if we allow only 
> UTF-8.

Let's not forget also that the goal in CMF 2 (I think) is to have all 
content be unicode strings, never encoded ones. In that case GenericSetup 
only has to deal with the XML file's encoding (always UTF-8 anyway) but 
that's all.

CPS 3 was a pure-latin1 application for various historical reasons, so we 
modified a number of I/O adapters so that they encode/decode properly what 
GenericSetup works with. CPS 3.4 tends to remove the hardcoding of latin-1 
to the site's default_charset property, but that's not been taken into 
account everywhere, although it should.

CPS 4 will be purely unicode, and won't need all that shit.

Florent

-- 
Florent Guillaume, Nuxeo (Paris, France)   Director of R&D
+33 1 40 33 71 59   http://nuxeo.com   fg at nuxeo.com


More information about the Zope-CMF mailing list