[Grok-dev] Re: Understanding unicode

Philipp von Weitershausen philipp at weitershausen.de
Sun Sep 23 10:45:13 EDT 2007


On 22 Sep 2007, at 19:48 , Jan Ulrich Hasecke wrote:
> We looked into all places and finally found the place where the  
> values in my choice list is encoded to ASCII
>
> In zope.schema vocabulary.py
>
> there is:
>
>     def __init__(self, value, token=None, title=None):
>         """Create a term for value and token. If token is omitted,
>         str(value) is used for the token.  If title is provided,
>         term implements ITitledTokenizedTerm.
>         """
>         self.value = value
>         if token is None:
>             token = value
>         self.token = str(token)
>         self.title = title
>         if title is not None:
>             directlyProvides(self, ITitledTokenizedTerm)
>
> self.token = str(token) converts the values from my Choice list to  
> ASCII, so that there is an error when there are non unicode strings  
> in the values like u'Paviankäfig'
>
> if you change the line to
>
> self.token = unicode(token)
>
> it works.
>
> Please have a look at this solution, maybe there are side effects.  
> But I hope that this is a good solution.

No it's not. Sorry.

The str(token) is there for a reason. The vocabulary spec says that  
tokens should be *ASCII*. Not unicode. Not 8bit strings. Just ASCII.  
So ideally, str(token) should always work. The problme is that one  
line above that it says "token = value", therefore ruining the whole  
str(token) line.

What this code should really do is

- check if a one-to-one mapping between the values' types and ASCII  
can be arranged. It can be for all integers, floats, and pure-ASCII  
strings. To support the whole unicode range, UTF-7 would have to be  
used.

- if there are objects that can't be mapped to an ASCII  
representation (e.g. arbitrary objects), then a *useful* error  
message should be shown indicating that a vocabulary/source shoudl be  
used instead.



More information about the Grok-dev mailing list