[Zope] Re: Unexplained restart on add

Sat, 13 Jul 2002 08:19:14 -0700

Thanks for the ideas on this (individual comments
at end).  I've localized the problem a bit, but I
still don't understand it -- here's hoping this
rings a bell with somebody. :-D

I've tracked the problem down to a "manage_upload()" 
call from a class extending "Image" (i.e. I've 
inherited "manage_upload" from the OFS.Image.File
class in Zope).  It happens on the very first call,
so this must be a deterministic bug and not a
resource-limit or other cumulative problem.  (I just
tracked this down with breakpoints in my product
code).

I've applied the patch to Python 2.1.3 to deal with
the stack size problem on FreeBSD.  I've eliminated
ZEO, so I'm now using regular filestorage on the
server. So there are fewer unknowns now.

The call to manage_upload is given a Python file
object (I've verified that it has valid values for
f.name, f.mode, and f.fileno()). The file in question
exists on the filesystem (in a subdirectory under
the product directory):

/usr/local/zope/instance/Products/Narya/www/Emoticon
-rw-rw-rw-  1 nobody  nobody   496 Apr  7 03:18 em_angry.gif

What's more, I had (somewhat inefficiently) read in
this entire file's contents using a regular f.read()
method without causing any errors, right before making
the call to manage_upload. Just for grins, I added
f.seek(0) after that to make sure it was positioned
at the beginning of the file. No joy, though.

(I have tried variations on the file permissions and
ownership -- this seems like the most "wide-open" case,
in an effort to rule-out permission problems).

This works just fine on my development server, but fails
on the production server as I described (capsule below).

I'm not really too clear on how manage_upload() works --
what sort of things does it have to be able to do? Am
I correct in feeding it the file object? (and if not,
why did it work before, I wonder?).

The story so far...
Terry Hancock (I) wrote:
> The problem is that, although the product shows up
> in the product control panel okay, attempting to
> add the main object from the product into a folder
> causes the server to restart (some of the lesser
> objects defined by my product will add without
> problems).
> 
> To make matters worse, it does this without any kind
> of explanation: no traceback, no log messages [...]
> 
> There are, of course, lots of little differences:
> * I'm running Debian Linux and they are running FreeBSD
>   (I think), though both are Intel architecture and
>   the packages are installed from source. (I'm not
>   using the Debian Zope package, but one downloaded
>   from www.zope.org).
> 
> * I run Zope as a special user, while they have it
>   starting as "root" (which means it should run as
>   "nobody" IIRC).
> 
> * There are some products in their Zope install that
>   aren't in mine -- a hot fix, and some other, apparently
>   unrelated things.

Thank you very much for the recommendations so far...

Jaroslav Lukesh wrote:
> Is your machine health OK?
> Is data on the disk drive OK?
> Is your bus system OK (without hazards)?
> Do you run memtest86 (www.memtest86.com) and cpuburn test from Robert
> Redelmeier (search www.freshmeat.net) for 24 hours (2x BurnBX 2xBurnMMX
> 2xBurnP6 - depends on your CPU)  without error?
> Did you compile kernels in 10 parallel tasks continuously for 24h without
> binary difference?

Ack! I think those would violate my usage agreement! This
is someone else's computer, and a production zope server.
Anyway, the fact that the error is so deterministic I think
rules out hardware issues (which are generally unpredictable).

Charlie Reiman wrote:
> With all due respect, these machines don't sound nearly identical at all.
> Having said that, I can provide a little help.

Yeah, well, we try. I just meant I'm running the same version
of Zope and Python, so it ought not to be a version compatibility
problem.  I actually tried installing FreeBSD on my development
server, but I'm so much more familiar with Debian (especially the
install), so I stuck with that. I emphasize this, because with
my product being brand new, it's obviously my code that's most
suspect! ;-D

> The mysterious restarting is from the -Z option in the start script. Disable
> the debugging option (-D) and enable the watchdog (-Z watchdog.pid) on your
> development server. You will now have a watchdog zope monitoring and
> restarting the actual working zope (when it dies, of course). Check the
> source in z2.py for all the startup options.

Thank you! However, the funny thing is, z2.py *isn't* being called
with the -Z flag.  But I'll ask the folks who set it up about that. ;-D

> My suspicion is that you need to look into permissions. Your product might
> be doing something that it can't do when run as nobody.

This still seems the most suspicious. However, haven't I proven that
the program can access the file?  After all, the permissions are now
"wide open", the file is owned by "nobody" and I'm able to read in
the data with a regular f.read() operation, so it *can't* be a
permissions problem, can it?

I tried starting up the production server as a regular user,
but it wouldn't run -- I didn't try too hard -- I suspect they
might've tried to block this sort of thing for security reasons
(and I don't really want to run it that way -- I'm just trying
to track down the problem).

Jens Vagelpohl wrote:
> there is a known bug in python for (at least) FreeBSD that leads to sudden
> restarts. it has to do with the stack size for threads being too small.
> see this message::
> 
> http://groups.yahoo.com/group/zope/message/91934

Chris McDonough wrote:
> If you're on BSD, this is likely a thread stack space issue.  I
> can't find detailed instructions on how to make it better, but by
> default FreeBSD (as well as apparently Mac OS X) has a stack space
> of 64K, which is too small for many heavily recursive applications.
> I'd search the maillists for things like "stack size" "stack space",
> "bsd stack" etc.

"Matthew T. Kromer" wrote:
> Apologies for the attachment, but this is a tiny patch you can apply to
> Python 2.1.3 to double the stack size for threads up to 128K.
>                     Name: pthread.patch
>    pthread.patch    Type: Plain Text (text/plain)
>                 Encoding: 7bit

Did it. Even though I don't think this was a problem
(it certainly did not fix the crashing), it's probably a good
precautionary measure anyway. Thanks a lot for the patch
and the information about it. I'm pretty much a complete
newbie on BSD systems.

Still in the dark ...
Terry

-- 
------------------------------------------------------
Terry Hancock
hancock@anansispaceworks.com       
Anansi Spaceworks                 
http://www.anansispaceworks.com 
P.O. Box 60583                     
Pasadena, CA 91116-6583
------------------------------------------------------