[Zope-Coders] Analysis: BTrees and Unicode and Python

Andreas Jung Andreas Jung" <andreas@zope.com
Fri, 19 Oct 2001 11:43:30 -0400


After lots of debugging here an explanation for the behaviour we have
seen in the unittest:

- The BTrees calls PyCompare_Object() several times before the
  comparison that failed (unicode vs. unicode)

- one of these earlier comparision checks a Python string (containing
  and accented character) against a unicode string and raises a
  unicode exception  (ASCII decoding error: ordinal notr in range(128)).
  I assume because the default encoding is ascii.

- there is no check in the BTree code to check for an exception after
  PyObject_Compare() and so this error got never cleared

- when when trying to compare two identical unicode strings, Python
  calls default_3_way_compare() and runs into the following code:


static int
default_3way_compare(PyObject *v, PyObject *w)
{
    int c;
    char *vname, *wname;

    if (v->ob_type == w->ob_type) {
        /* When comparing these pointers, they must be cast to
         * integer types (i.e. Py_uintptr_t, our spelling of C9X's
         * uintptr_t).  ANSI specifies that pointer compares other
         * than == and != to non-related structures are undefined.
         */
        Py_uintptr_t vv = (Py_uintptr_t)v;
        Py_uintptr_t ww = (Py_uintptr_t)w;
        puts("\t\t\tdefcmp 1");
        return (vv < ww) ? -1 : (vv > ww) ? 1 : 0;
    }

  This code returns -1 for the two identical unicode strings.

I am not sure if this code is able to compare two unicode strings.
On the other hand it is still strange that the unittest works when
replacing the same unicode string in the list with the testdata in the
unittest
with self.s as described earlier.

Any ideas about that ?

Andreas