[ZODB-Dev] BTree data loss bug under Python 3.x

Sat May 25 15:49:20 UTC 2013

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/25/2013 02:50 AM, Marius Gedminas wrote:
> On Fri, May 24, 2013 at 09:31:55PM -0400, Tres Seaver wrote:
>> On 05/24/2013 01:28 PM, Marius Gedminas wrote:
>>> I was trying to debug a failing ZEO test case 
>>> (testConcurrentUpdates), and found out it had nothing to do with 
>>> ZEO.  There's a bug either in BTrees or in ZODB itself that
>>> causes data corruption under Python 3.x.
>>> 
>>> Here's a test case: 
>>> https://gist.github.com/mgedmin/5644876#file-zodbfail_simple-py
>> 
>> Thanks for the detective work!  I can confirm the bug, and that it
>> is in the BTrees C extensions (the tests pass after 'rm 
>> .tox/py32/lib/python3.2/site-packages/BTrees/*.so').
> 
> It's the COMPARE macro in _compat.h.  This patch fixes the data
> corruption:
> 
> diff --git a/BTrees/_compat.h b/BTrees/_compat.h index
> e004d54..4765df1 100644 --- a/BTrees/_compat.h +++ b/BTrees/_compat.h 
> @@ -28,8 +28,8 @@ #define TEXT_FORMAT PyUnicode_Format
> 
> #define COMPARE(lhs, rhs) \ -    PyObject_RichCompareBool((lhs),
> (rhs), Py_LT) > 0 ? -1 : \ -    (PyObject_RichCompareBool((lhs),
> (rhs), Py_EQ) > 0 ? 0 : 1) +    (PyObject_RichCompareBool((lhs),
> (rhs), Py_LT) > 0 ? -1 : \ +     (PyObject_RichCompareBool((lhs),
> (rhs), Py_EQ) > 0 ? 0 : 1))
> 
> 
> #else
> 
> 
> I'm now trying to write a failing unit test.  It's a bit difficult to 
> get access to C macros from Python tests ;-)
> 
> COMPARE is used in two places:
> 
> BTrees/objectkeymacros.h-#define TEST_KEY_SET_OR(V, KEY, TARGET) \ 
> BTrees/objectkeymacros.h:if ( ( (V) = COMPARE((KEY),(TARGET)) ),
> PyErr_Occurred() )
> 
> and
> 
> BTrees/objectvaluemacros.h:#define TEST_VALUE(VALUE, TARGET)
> COMPARE((VALUE),(TARGET))
> 
> We can ignore TEST_KEY_SET_OR, as it works fine there.
> 
> TEST_VALUE is used to implement BTree.byValue and bucket merging.
> The bug I was chasing down was caused by bucket mismerges, I think.
> 
> It should be easier to write a test for byValue.  In fact there is
> one, in test__base.py, but it's not run against real C-implemented
> BTrees.  I copied it over into test_OOBTree.py and discovered that it
> fails unexpectedly:
> 
> diff --git a/BTrees/tests/test_OOBTree.py
> b/BTrees/tests/test_OOBTree.py index bc046bc..e813f88 100644 ---
> a/BTrees/tests/test_OOBTree.py +++ b/BTrees/tests/test_OOBTree.py @@
> -99,9 +99,15 @@ class OOSetPyTest(ExtendedSetTests,
> unittest.TestCase):
> 
> class OOBTreeTest(BTreeTests, unittest.TestCase):
> 
> -    def _makeOne(self): +    def _makeOne(self, *args): from
> BTrees.OOBTree import OOBTree -        return OOBTree() +
> return OOBTree(*args) + +    def test_byValue(self): +        ITEMS =
> [(y, x) for x, y in enumerate('abcdefghijklmnopqrstuvwxyz')] +
> tree = self._makeOne(ITEMS) +
> self.assertEqual(list(tree.byValue(22)), +
> [(y, x) for x, y in ITEMS[22:]])
> 
> def testRejectDefaultComparison(self): # Check that passing int keys w
> default comparison fails. @@ -146,9 +152,9 @@ class
> OOBTreePyTest(OOBTreeTest): # default 'object' comparison semantics. 
> #class OOBTreePyTest(BTreeTests, unittest.TestCase):
> 
> -    def _makeOne(self): +    def _makeOne(self, *args): from
> BTrees.OOBTree import OOBTreePy -        return OOBTreePy() +
> return OOBTreePy(*args)
> 
> 
> 
> 
> The failure is
> 
> AssertionError: Lists differ: [(25, 'z'), (24, 'y'), (23, 'x... !=
> [(22, 'w'), (23, 'x'), (24, 'y...
> 
> First differing element 0: (25, 'z') (22, 'w')
> 
> - [(25, 'z'), (24, 'y'), (23, 'x'), (22, 'w')] + [(22, 'w'), (23,
> 'x'), (24, 'y'), (25, 'z')]
> 
> so the C-implemented BTrees return values in descending order, while
> the base Python implementation returns them in ascending order.  This
> happens on Python 2.7 as well, so it's not related to the COMPARE
> macro bug.
> 
> Does anybody know what the API of BTree.byValue() is supposed to be? 
> The interface just says "Return a sequence of (value, key) pairs,
> sorted by value", without clarifying if it means an ascending or a
> descending sort.

The docstring in the C module says:

    byValue(min) ->  list of value, key pairs

    Returns list of value, key pairs where the value is >= min.  The
    list is sorted by value.  Note that items() returns keys in the
    opposite order.

So I guess the descending order is intentional:  the call to '.reversed'
is certainly there.  Let's update the Python version to match, and use
your test to verify both.

Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iEYEARECAAYFAlGg3YAACgkQ+gerLs4ltQ5eeACgywzDPvD33nkM4Skd05PSOLMR
WPcAn03amWB8zcsmU+eQKnZBZ6AbYr/d
=7AlH
-----END PGP SIGNATURE-----