[Zope] Dealing with non-ASCII strings?

Toby Dickenson tdickenson@geminidataloggers.com
Wed, 2 Oct 2002 09:32:49 +0100


On Wednesday 02 Oct 2002 9:06 am, Jean Jordaan wrote:

> Hi there
>
> In the data that we have to work with, there are names in French,
> Turkish, German, Greek, etc. A sample string, when printed from Python,
> is: 'Rabia-r\xddza Bi\xe7en \xf6grenci Yurdu.G\xf6r\xfckle'
> We'd like to store this data in LDAP and in Zope.
>
> Questions:
>
>   - How do we find out what the current encoding of the strings are?
>     Guess?

guessing is your only option if you cant ask the person who supplied you =
with=20
your data.

>   - Say we decide it's Latin-7. How do we convert from the current
>     string to Unicode, taking into account the fact that the source is
>     taken to be Latin-7?

unicode_string =3D unicode(encoded_8bit_string, 'data character encoding'=
)

>   - Do we need to move to Zope 2.6 in order to cope with such strings?

It depends what you want to do with them. You need 2.6 if you want to use=
 them=20
in property pages, in dtml, or allow them to be edited in forms.

(you could get patches for Zope 2.4 from=20
http://www.zope.org/Members/htrd/wstring. They dont apply cleanly to 2.5,=
 but=20
are known to work after a little manual merging. Overall I think the 2.6=20
upgrade will be less pain)


If you want to continue using an unpatched 2.5.x then you will need to=20
manually call the unicode string's encode method every time you use it:

unicode_string.encode('page character encoding')