[Grok-dev] Tests, Unicode and Fileencoding

Uli Fouquet uli at gnufix.de
Sat Nov 17 11:50:19 EST 2007


Jan Ulrich Hasecke wrote:

> starting to use tests while developing my app,

Good move! Go ahead :-)

>  I discovered that you  
> can only use unicode strings in tests, when you specified the file  
> encoding of the testfile.
> So my zoo.txt starts with:
> ------snip----------
> # -*- coding: utf-8 -*-
> =========================
>   The Online Game GrokZoo
> =========================
> (...)
> -----snap-----------
> Is that the intended behaviour?

Though I am not very into this topic, I think it is merely the (Python)
default behaviour, not the intended behaviour.

> What is the default encoding /bin/test expects?

/bin/test does not expect a certain encoding. It only looks for tests
and runs them. This is good from my point of view, because others might
prefer other encodings than utf8. I think your test setup code is to
blame instead (at least, if you have 'borrowed' it from me).

>  ASCII? So why ASCII?

Registering doctests files as unittest testsuites (your example code
looks like it), often means to call the Python standard library function
``doctest.DocFileSuite()`` in the test setup. Have a look at your test
setup code.

``DocFileSuite()`` returns a ``unittest.TestSuite`` and takes the system
standard encoding as default. But you can pass an optional ``encoding``
parameter to setup the docfiles with a certain non-standard encoding.
See http://docs.python.org/lib/doctest-unittest-api.html

For example::

def test_suite():
    suite = unittest.TestSuite()
    for filename in DOCTESTFILES:
            encoding='utf8', ## SET THE ENCODING HERE... ##
    return suite

would expect your doctest files all utf8 encoded. 

Marking the doctest files with `# -*- coding: utf-8 -*-` as you did,
also doesn't look like too heavy lifting to me. Interesting, that it
works :-)

Did you get ``UnicodeError`` before?

> Wouldn't utf-8 be better, since we claim to have unicode everywhere  
> in Zope?

There is a difference between 'having unicode' and 'everything is utf8
encoded'. Python's current internal unicode representation for example
is UCS2 or UCS4 if I remember correctly. If you meant that every input
and output from and to 'Zope' should be utf8 (or utf16), then I happily
leave this discussion to the gurus :-)

With the grok.testing extension BTW one could setup a different standard
encoding to be expected in doctest files. This could solve that little
problem for Grok. But to be honest, I don't recognize this as a real
problem and can live without it.

Kind regards,


More information about the Grok-dev mailing list