reloading modules (was Re: [Zope3-dev] Re: Google SoC Project)
Shane Hathaway
shane at hathawaymix.org
Tue May 9 14:11:16 EDT 2006
Adam Groszer wrote:
> What about pushing the problem then to the lower level, to Python
> itself. I think all developers are fighting the same problem, so all
> Python developers would benefit from the solution. As I know (that may
> be wrong) not many even if any language supports that, so that would
> make one big plus point on the Python side also.
>
> As I don't have really deep knowledge of the Python interpreter
> itself, I cannot imagine how weird is the idea. Maybe we should ask
> Guido to have some thoughts about that.
I've spent time thinking about this. Modern operating systems are
surprisingly good at reloading processes, but in general, it's hard to
reload pieces of a process. What's the difference?
I think the difference is in the type of interdependence. Operating
systems force processes to talk to each other through high level
mechanisms like files, streams, sockets, memory mapped I/O, and so on.
Good programmers understand that processes can die and thus make their
software resilient to communication channel interruptions.
Within a process, programmers have no such expectation. Once the
programmer imports a module, the programmer expects the imported module
to remain unchanged. There is rarely any concept that modules are
actually communicating with each other. A sticky morass of inter-module
pointers quickly forms, leaving little hope of reliably reloading
arbitrary modules. The operating system has to intervene in order to
start the process over.
Shared memory makes it possible to link processes at a deeper level, but
in practice, shared memory is used mostly for threading. It's no
coincidence that multiple threads are generally thought of as a single
process that has to restart together. Once two processes share
pointers, it's hard to unbind them.
So I have considered two basic approaches for reliably reloading a module:
1) Code the reloadable module as a pure communication endpoint, treating
the module almost like a process. No other modules should import from
the module; instead, the module should register itself with a framework
and other modules should talk to the module only through that framework.
This is a good approach for writing reloadable application-specific
plugins. You can also support clusters of modules that represent a
single plugin.
The Zope 2 refresh mechanism works quite well with products written this
way. Unfortunately, keeping modules free of interdependencies is
difficult, and that's a major support risk.
2) Make reloadable code fundamentally different. If module X is
supposed to be reloadable, and X creates a module-level global variable
Y, and module Z imports Y, then Y needs to be decorated in such a way
that Z's view of Y can change automatically when X is reloaded.
This second approach has subtle limitations, though. What if Y has the
value 10 and Z defines a global variable A whose value is (Y**2)? The
value of A might need to change when Y changes, but how can we arrange
for that to happen without making a mess of the code? I doubt there's
any reasonable general solution.
Even more subtle is what happens when a reloadable module holds a
registry of things imported from other modules. When the module is
reloaded, should the registry get cleared? Zope 2's refresh says the
registry should be cleared, but in practice, this confuses everyone.
To solve this, I think reloadable modules need to have a special global
namespace. Everything in the global namespace, as well as everything
reachable from the global namespace, must be explicit about what happens
at the time another module imports it or the module is reloaded. I
think this could make a refresh mechanism like the one in Zope 2
reliable. It has a lot of similarity with persistent modules, but it
might be simpler. I haven't thought it all the way through. The idea
came to me about halfway through this post. :-)
Shane
More information about the Zope3-dev
mailing list