[Grok-dev] Re: Understanding unicode

Sun Sep 23 12:55:36 EDT 2007

Hi Jan-Ulrich,

Am Samstag, den 22.09.2007, 19:48 +0200 schrieb Jan Ulrich Hasecke:
> Am 31.08.2007 um 22:32 schrieb Philipp von Weitershausen:
> > Jan Ulrich Hasecke wrote:
> >
> 
> Hi all,
> 
> yesterday on the meeting of the zope user group rhineland we tracked  
> down this error. Special thanks to Uli and Charlie Clark. Today I  
> tried to find a solution and I hope that I found a fix for this.
> 
> To get into the problem again here comes my code and the traceback
> 
> ----- zoo.py ----------
> 
> class GehegeBauen(grok.AddForm):
> 	"""The view to add a cage for elephants"""
> 	grok.context(GrokZoo)
> 	grok.name('gehegebauen')
> 	form_fields = grok.Fields(
> 	name=Choice(title=u'Gehege', values= 
> [u'Elefantengehege',u'Giraffengehege', u'Paviankäfig']),
> 	groesse=Int(title=u"Wieviele Tiere sollen Platz haben?"))
> 	
> 	label= u'Neues Gehege bauen'
> 	@grok.action('Gehege bauen')
> 	def add(self,name, groesse=8):
> 		if name in self.context.keys():
> 			return
> 			self.redirect(self.url('index'))
> 		self.context.bauegehege(name, groesse, tierart='default', kosten=100)
> 		self.redirect(self.url('index'))
> 
[...]
> Traceback (most recent call last):
[...]
>      UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4'  
> in position 7: ordinal not in range(128)

> 
[...]
> In zope.schema vocabulary.py
> 
> there is:
> 
>      def __init__(self, value, token=None, title=None):
>          """Create a term for value and token. If token is omitted,
>          str(value) is used for the token.  If title is provided,
>          term implements ITitledTokenizedTerm.
>          """
>          self.value = value
>          if token is None:
>              token = value
>          self.token = str(token)
>          self.title = title
>          if title is not None:
>              directlyProvides(self, ITitledTokenizedTerm)
> 
> self.token = str(token) converts the values from my Choice list to  
> ASCII, so that there is an error when there are non unicode strings  
> in the values like u'Paviankäfig'
> 
> if you change the line to
> 
> self.token = unicode(token)
> 
> it works.
> 
> Please have a look at this solution, maybe there are side effects.  
> But I hope that this is a good solution.

Sorry, but I agree with Philipp, that this is not a good solution. As
Philipp already said, what created here is a token and tokens should be
(7Bit) ASCII. See also zope.schema.fields.txt for deeper insights.

What you can see in the above code snippet, is a differentiation between
values and tokens, that happens in a context, where a vocabulary is
created. Vocabularies, as I understood it, basically map values to
simple ASCII representations and/or 'terms'. See Philipp's Book, chapter
17.

I now wonder, whether there is a simple solution to create Choice fields
containing umlauts (and chinese chas ;-). In JUHs example somewhere the
fromValues() classmethod of SimpleVocabulary is called:

	zope.schema.vocabulary.SimpleVocabulary.fromValues(<list>)

to create a simple vocabulary. This maps values from the <list>
one-to-one to tokens and fails miserably, if a value is not proper
ASCII. This was the reason for your original problem.

To bypass this, one could think of another SimpleVocabulary classmethod,
fromItems(), where you can pass different tokens and values like this:

	zope.schema.vocabulary.SimpleVocabulary.fromItems(
 	    [('kuestenseeschwalbe', u'Küstenseeschwalbe')]
        )

Here 'kuestenseeschwalbe' is a token and u'Küstenseeschwalbe' is a
value. In this case, I would have expected a select box to render it to
something like that:

       <select ...>
          <option value='kuestenseeschwalbe'>Küstenseeschwalbe</option>
       </select>

Instead, I get:

       <select ...>
          <option value='kuestenseeschwalbe'>kuestenseeschwalbe</option>
       </select>

i.e. only the token is used (twice).

Is there a simple solution to create Choice fields with simple
vocabularies and umlauts? Or is the basic Choice field just too limited
in that respect? Sorry, if that became off-topic.

Kind regards,

-- 
Uli