[Zope-dev] Confusing segfault for Zope2 head on RH7.3

Barry A. Warsaw barry@zope.com


Working on the Zope 2.7 branch I noticed a very strange crash while
running the test suite.  I re-checked out the Zope 2 cvs head, rebuilt
everything from scratch and I still get the crash.  I'm about out of
ideas so I thought I'd post what I know here in case anyone else can
confirm or give clues.

First off, I'm using a fresh checkout of the Zope 2 cvs head with a
virgin Python 2.1.3 (although a cvs head of the Python 2.1 maintenance
branch has the same problem).  I've rebuilt Python 2.1.3 with only the
-g option (no -O, although it makes no difference), and I've rebuilt
Zope using this command:

    % python2.1 setup.py build_ext -i

which also builds with just the -g option, although that also makes no
difference.  Note that I've also built using "python2.1 wo_pcgi.py"
and that also makes no difference.

I then run the test suite like so:

    % PYTHONPATH=lib/python python2.1 utilities/testrunner -a

and I reliably get a crash in ISO_8859_1_Splitter.c  I've boiled it
down to the following simplified test case:

-------------------- snip snip --------------------crash.py
from Products.PluginIndexes.TextIndex.Splitter.ISO_8859_1_Splitter.ISO_8859_1_Splitter import ISO_8859_1_Splitter
x = ISO_8859_1_Splitter('hello world')
list(x)
-------------------- snip snip --------------------

run it like so:

    % PYTHONPATH=lib/python python2.1 crash.py

The top of the backtrace is:

#0  PyErr_SetString (exception=0x6e727574, 
    string=0x402660dc "Splitter index out of range") at Python/errors.c:69
#1  0x402658af in ISO_Splitter_item (self=0x843ed98, i=2)
    at Products/PluginIndexes/TextIndex/Splitter/ISO_8859_1_Splitter/src/ISO_8859_1_Splitter.c:357
#2  0x0808b98d in PySequence_List (v=0x843ed98) at Objects/abstract.c:1258
#3  0x080b934f in builtin_list (self=0x0, args=0x810137c)
    at Python/bltinmodule.c:1357
...

(I hacked the C code to rename Splitter_item() in
ISO_8859_1_Splitter.c to ISO_Splitter_item() for ease of setting the
break point.)

Here's the deal.  Splitter_item() cruises along as expected until it
hits the end of the text.  word is Py_None and it drops into the
clause that sets the IndexError exception.  AFAICT, there are no
reference counting bugs in the code.  At the PyErr_SetString() call
PyExc_IndexError is a perfectly valid PyObject*, but just one stack
frame later, inside the PyErr_SetString, the exception object is a
completely bogus address.  PyErr_SetString() is not getting the same
object that Splitter_item() is providing in the first argument.  Note
that the second argument, a char* is just fine.

Stepping through the code with gdb gives no help.  I can set break
points and inspect all the objects created and decref'd and everything
appears okay.  Printing PyExc_IndexError right before the call shows
nothing strange, but stepping into PyErr_SetString() shows the first
arg is corrupted.  I don't get it.  I've show the other PLabbers and
it perplexes them too. :)

To make matters weirder, we've tested this on various Linux flavors
and versions.  On RH6.1 and MD8.{0,1} it works just fine -- no crash.
The /only/ system that I've seen the crash is RedHat 7.3, and there it
crashes on both of my up-to-date RH7.3 systems, every time, and
reliably.

I think I've eliminated gcc as a culprit because most of our Linux
boxes are using the same gcc version, according to gcc -v:

Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-110)

i.e 2.96 20000731.

The /only/ difference that I can tell is the glibc version.  My RH7.3
systems have glibc 2.2.5 and on all the other systems glibc's are
older; no system with glibc 2.2.4 or earlier exhibits the crash.

I have not tested gcc 3.1 and I have not tried other glibc's on
RH7.3.  I'd really like to know if anybody else on RH7.3 either sees
the same crash or doesn't.  And if so <wink> what version of gcc and
glibc you've got.  I'm starting to suspect a glibc bug, but it's dang
strange.

-Barry