[ZODB-Dev] Latest news: core dump with small change to POSException.py

Greg Ward gward@mems-exchange.org
Tue, 25 Sep 2001 17:47:45 -0400


Last week, you may recall that I reported Python dumping core with a
small change to ZODB/POSException.py.  Well, it's still happening with
the latest Python from CVS.  Situation:

  StandaloneZODB: latest from CVS, with two small changes [below]
  Python: latest from CVS

If I run "python test.py", it churns away, passing 300 or so tests, and
then dumps core.  I can narrow it down a bit better than that though:

$ python test.py -vv Conflict
BTrees.tests.testConflict 16
testFailMergeDelete (BTrees.tests.testConflict.TestIOSets) ... zsh: segmentation fault (core dumped)  /usr/local/bin/python2.2 test.py -vv Conflict

First, the two changes I've made:
  1) add a tiny, trivial constructor to ConflictError.  This is the
     change that causes the coredump; here's the patch:
--- ZODB/POSException.py        12 Apr 2001 20:47:00 -0000      1.7
+++ ZODB/POSException.py        25 Sep 2001 21:39:16 -0000
@@ -107,2 +107,5 @@
 
+    def __init__ (self, message):
+        self.args = (message,)
+
 class VersionError(POSError):

  2) add some print statements to BTrees/tests/testConflict.py
     -- this is where the coredump occurs, so I wanted to have
     some context.

I can actually narrow it down quite a bit more -- I wrote a tiny script
to get PyUnit out of the picture entirely (except for superclassing) and
run the one test case that's causing the core dump.  Here's the script:

-- snip ----------------------------------------------------------------
from BTrees.tests import testConflict 

case = testConflict.TestIOSets(methodName='testFailMergeDelete')
case.setUp()
case.testFailMergeDelete()
-- snip ----------------------------------------------------------------

With the print statements in testConflict.py, here's the output of that
script:

testConflict.test_merge:
  o1 = IOSet([-9282, -9254, -7660, -7538, -6378, -4928, -4737, -4477, -3992, -3733, -2487, -951, -94, -89, 1464, 1560, 1605, 7975, 9242, 9951])
  o2 = IOSet([-9254, -7660, -7538, -6378, -4928, -4737, -4477, -3992, -3733, -2487, -951, -94, -89, 1464, 1560, 1605, 7975, 9242, 9951])
  o3 = IOSet([-9254, -7660, -7538, -6378, -4928, -4737, -4477, -3992, -3733, -2487, -951, -94, -89, 1464, 1560, 1605, 7975, 9242, 9951])
  expect = IOSet([-9282, -9254, -7660, -7538, -6378, -4928, -4737, -4477, -3992, -3733, -2487, -951, -94, -89, 1464, 1560, 1605, 7975, 9242, 9951])
  message = 'merge conflicting delete'
  should_fail = 1
calling o1._p_resolveConflict()
zsh: segmentation fault (core dumped)  /usr/local/bin/python2.2 /tmp/t.py

Here's the first couple frames from the C traceback reported by gdb:

#0  eval_frame (f=0x810f984) at Python/ceval.c:726
#1  0x8075400 in PyEval_EvalCodeEx (co=0x81934c0, globals=0x8198144, locals=0x0, args=0x8198964, 
    argcount=5, kws=0x8198978, kwcount=1, defs=0x81ae800, defcount=2, closure=0x0)
    at Python/ceval.c:2542
#2  0x80773c8 in fast_function (func=0x81ae7bc, pp_stack=0xbffff704, n=7, na=5, nk=1)
    at Python/ceval.c:3092
#3  0x80744d1 in eval_frame (f=0x81987f4) at Python/ceval.c:1991
#4  0x8075400 in PyEval_EvalCodeEx (co=0x8191f30, globals=0x8198144, locals=0x0, args=0x810caf8, 
    argcount=1, kws=0x810cafc, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:2542

Note that this is pretty much the same as I was seeing with Python 2.1
and 2.1.1 -- a coredump in ceval.c.  The line number in ceval.c
changed, but who's surprised at that?

Here's the relevant excerpt from Python/ceval.c:

		switch (opcode) {

		/* BEWARE!
		   It is essential that any operation that fails sets either
		   x to NULL, err to nonzero, or why to anything but WHY_NOT,
		   and that no operation that succeeds does this! */

		/* case STOP_CODE: this is an error! */

		case POP_TOP:
			v = POP();
			Py_DECREF(v);       <<<<--- LINE 726 
			continue;

OK, that's as much as I can do before my eyes glaze over completely.
I'd love to hear from someone who actually knows Python's guts (I know
you're out there...) on this.  Specifically, if you apply my tiny patch
to ZODB/POSException.py and run either the full test suite or my tiny
test-running script, do you also see this coredump.  It is 100% reliable
for me; occurs with Python 2.1, 2.1.1, and current CVS.  And it goes
away completely if I change the ConflictError constructor from this:

  def __init__ (self, message):
    self.args = (message,)

to this:

  def __init__ (self, *args):
    self.args = args

And please don't tell me, "Well do it that way then!".  Python shouldn't
dump core because of how some class constructor is defined!

        Greg
-- 
Greg Ward - software developer                gward@mems-exchange.org
MEMS Exchange                            http://www.mems-exchange.org