[Grok-dev] Tests, Unicode and Fileencoding

Sat Nov 17 11:50:19 EST 2007

Hi JUH,

Jan Ulrich Hasecke wrote:

> starting to use tests while developing my app,

Good move! Go ahead :-)

>  I discovered that you  
> can only use unicode strings in tests, when you specified the file  
> encoding of the testfile.
> 
> So my zoo.txt starts with:
> 
> ------snip----------
> 
> # -*- coding: utf-8 -*-
> 
> =========================
>   The Online Game GrokZoo
> =========================
> 
> (...)
> 
> -----snap-----------
> 
> Is that the intended behaviour?

Though I am not very into this topic, I think it is merely the (Python)
default behaviour, not the intended behaviour.

> What is the default encoding /bin/test expects?

/bin/test does not expect a certain encoding. It only looks for tests
and runs them. This is good from my point of view, because others might
prefer other encodings than utf8. I think your test setup code is to
blame instead (at least, if you have 'borrowed' it from me).

>  ASCII? So why ASCII?

Registering doctests files as unittest testsuites (your example code
looks like it), often means to call the Python standard library function
``doctest.DocFileSuite()`` in the test setup. Have a look at your test
setup code.

``DocFileSuite()`` returns a ``unittest.TestSuite`` and takes the system
standard encoding as default. But you can pass an optional ``encoding``
parameter to setup the docfiles with a certain non-standard encoding.
See http://docs.python.org/lib/doctest-unittest-api.html

For example::

def test_suite():
    suite = unittest.TestSuite()
    for filename in DOCTESTFILES:
        suite.addTest(doctest.DocFileSuite(
            filename,
            package=mypackagename,
            setUp=setUpZope,
            tearDown=cleanUpZope,
            encoding='utf8', ## SET THE ENCODING HERE... ##
            optionflags=doctest.ELLIPSIS+
            doctest.NORMALIZE_WHITESPACE)
        )
    return suite

would expect your doctest files all utf8 encoded. 

Marking the doctest files with `# -*- coding: utf-8 -*-` as you did,
also doesn't look like too heavy lifting to me. Interesting, that it
works :-)

Did you get ``UnicodeError`` before?

> Wouldn't utf-8 be better, since we claim to have unicode everywhere  
> in Zope?

There is a difference between 'having unicode' and 'everything is utf8
encoded'. Python's current internal unicode representation for example
is UCS2 or UCS4 if I remember correctly. If you meant that every input
and output from and to 'Zope' should be utf8 (or utf16), then I happily
leave this discussion to the gurus :-)

With the grok.testing extension BTW one could setup a different standard
encoding to be expected in doctest files. This could solve that little
problem for Grok. But to be honest, I don't recognize this as a real
problem and can live without it.

Kind regards,

-- 
Uli