[ZODB-Dev] Daemon manager design issues

Guido van Rossum guido@python.org
Mon, 11 Nov 2002 09:34:06 -0500


[Guido]
> > - There needs to be a way to stop the daemon manager (and the daemon
> >   application).  Shall I do this with a signal

[Toby]
> The daemon manager needs to handle SIGINT and SIGTERM to shut itself
> down cleanly, because these signals are sent by things outside our
> control.

Sure.

> >  or with a separate
> >   utility that talkes to the daemon manager, perhaps through a
> >   Unix-domain socket?
> 
> This is what daemontools does, and I find that works well.

Yes, I'll do this.

> > - Ditto for restarting the daemon application.  I guess this has to
> >   kill the application with a signal
> 
> yes, and yes
> 
> > unless we want to get fancy and
> >   decide on a separate parent-child protocol (which I think would be
> >   overkill).  We could do this in three ways:
> >
> >   - Use a separate utility as described above.
> 
> Do you mean that the "daemon manager control" utility communicates
> with the daemon manager process using a socket (or whatever), then
> the daemon manager sends a SIGTERM direct to the application. If so,
> +1

OK.

> >   - Send the signal directly to the application (this means there has
> >     to be a file with the application's pid; the daemon can write
> >     this);
> 
> If you mean that the user should signal the managed application
> directly, then -1. This opens up the same race conditions that
> Zope/ZEO currently suffers from, because there is a small time
> window where the pid file on disk is stale during a restart.

I'm not sure that the race is important (our sysadmins typically keep
sending signals until it dies :-) but I agree it's better to do this
via the daemon manager.

> > - Logging is configured through environment variables, which are
> >   passed on to the application.  Is there a need to be able to
> >   configure the manager's logging separately from the application, or
> >   is it okay that the manager always logs to the same file as the
> >   application?
> 
> I suspect this would be useful, but I would be happy provided I can
> get them both to target syslog.

Well, if you figure out how to configure zLOG to target syslog, let me
know.  All I know is STUPID_LOG_FILE, and I'm not going to invent yet
another logging mechanism (rather, I expect that things will get
better once we can use the standard Python logging module from see PEP
282).

> > Important: if at any point the application exits with exit status 2,
> > it is not restarted.  Any other form of termination (either being
> > killed by a signal or exiting with an exit status other than 2) causes
> > it to be restarted.
> 
> What is the use case for this? 

I presume you're asking why I'm making an exception for exit 2.  This
is the conventional exit status when there is a command line syntax
error, since, oh, Unix v7 or so.  I don't see how restarting could fix
such an error.  Now, I know that not all tools use this convention,
but AFAIK enough do to make it useful to watch out for it.

> (if we need it, then can we make this behaviour optional please?)

What's the use case for making it optional?  Do you have any tools
that use exit 2 to signal "please restart me"?

I realize I forgot something though -- exit 0 should also be taken as
"don't restart".  This is consistent with the current zdaemon
behavior.  I assume this is so that when you use the ZMI to shut down
Zope, the daemon manager won't restart it.

[Jeff Rush]
> > > I'd like to see an approach based on /sbin/init, where
> > > something like /etc/inittab lists instances and I can
> > > switch Zope 'runlevels' to bring up/down groups of
> > > instances.

[Toby]
> I think it would make sense to have this implemented as a layer on
> top of Guido's "daemon manager". We could probably even do this as
> shell scripts.

Good.  I've looked at a design for Jeff's multi-daemon manager, and
I've decided that it would be too ambitious right now.  We may get
back to that at a later point, unless I find it's significantly less
work than it appears right now.

> > > And the separate utility might have status reporting
> > > features, giving a snapshot of servers up/down like
> > > ps or top.
> >
> > Yes, but it won't be able to tell you much more than whether the
> > process exists or not 
> 
> I think this should be tristate:
> 1. Manager process down
> 2. Manager process up, application down (see below if you dont
>    think this state matters)

I agree it matters -- also because of the automatic backoff when the
application dies quickly.

> 3. Manager process up, application up

> > > [1] basic protection against starting the same instance twice;
> > >     the current zdaemon doesn't stop this
> >
> > Question, how do you know that the same "instance" is already up,
> > without building in a lot of knowledge about the application?
> 
> Using the status reporting tool. We protect against two copies of
> the manager process, *not* two copies of the application.

But how do two copies of the manager process know that they are
managing the same application?  I expect that this daemon manager will
be used to manage Zope as well as the ZEO storage server, and possibly
other things as well (Squid? and the ZRS product).  Also, there may be
multiple ZEO storage servers!  (You *can* run these in one process,
but that's not always a good idea, and multiple machines may be
overkill when the separation is for conceptual reasons.

> > > [2] You may already do this (I haven't checked your source) but
> > >     zdaemon ought to 'cd' into the Zope/var directory to avoid
> > >     unnecessarily holding onto directories.
> >
> > Good point.  The question is, how to decide which directory to
> > chdir into without building in too much knowledge about the
> > application.  I want this to be fully general.
> 
> daemonutils has a working directory for the manager process, which
> contains the application start script, its equivalent of our default
> runlevels, and the manager control socket. I find this approach
> works well. I think it would be fair for the manager process to
> chdir into there, and leave Zope to chdir into Zope/var.

Can you suggest a default directory to pick for the daemon manager?
I'd prefer to require as little configuration and setup as possible.
Maybe I should just chdir into /tmp (really tempfile.gettempdir())?

> Another nice feature from daemonutils if for the daemon manager
> process to be able to bring the application down without restarting
> it, but the manager remains running. This can be used to implement
> the runlevel changing tool, provided it is ok to leave the manager
> processes running in the 'wrong' runlevel.

If you think this is useful, it's easily done.

> > > BTW, you may want to look at Dan Bernstein's daemontools for
> > > ideas; they provide a framework for starting and stopping daemon
> > > processes.  (You might even consider using daemontools, but like
> > > Bernstein's other tools the directory organization is a bit
> > > eccentric

> > Someone posted here earlier saying they were lacking something; I
> > forget what.
> 
> That was me:
> http://lists.zope.org/pipermail/zope-coders/2002-November/002634.html
> I can recommend looking at this package for *ideas*, even if you
> dont end up using the tools.

Will do.

> > > and the license may be a problem.
> 
> public domain?

Some lawyers don't believe in the public domain. :-)

Anyway, the source looks complicated enough to want to have a Python
version we can actually understand and hack.

--Guido van Rossum (home page: http://www.python.org/~guido/)