[Python-Dev] Re: [Zope3-dev] Zip import and sys.path manipulation (was Re: directory hierarchy proposal)

Guido van Rossum guido@python.org
Mon, 16 Dec 2002 14:41:55 -0500


[Please try to limit the width of your lines to 72 characters or so.]

> FWIW, I think it's generally a bad idea to depend on module.__file__
> and pkg.__path__ to find data files.

Of course you think this, because you just recently invented something
that would break this assumption. :-)

The problem is that Zope3 needs *some* way of storing data and config
files related to specific packages, and there isn't really any other
convenient and conventional way except storing it in a package
directory and using the package's __file__ to find it.

Python's own test package uses this, and so does the pdb module for
its help file (the latter actually scans sys.path for a file pdb.doc).

> Playing tricks with the contents of pkg.__path__ apparently has its
> uses,

It's the *defined* API for adding to the set of locations from which
submodules and subpackages of a package are to be loaded.  You can't
just call that "playing tricks."

> but I think that (in general) it's a bad idea as
> well. module.__file__ is mostly an introspective aid, and
> pkg.__path__ should IMO be seen as merely an implementation detail.

Not true, see above -- it's a defined API for a specific purpose.

That's less so for __file__, but absent a well-defined alternative
API, __file__ should be supported.

I'm not saying that __file__ should always point to a file --
obviously it can't, if we want import from zip files.  But there's
considerable existing code that uses it in certain ways and we
shouldn't break that unless necessary.

> module.__file__: in a frozen module it will be set to
> "<frozen>". Expect __file__ to be a path to a file and you're
> screwed.

Of course.  If you freeze code, you have to make sure that it will be
able to find its data files in some other way.  But not all code has
to be freezable.

> To the lower levels of the import mechanism, the *existence* of
> pkg.__path__ is all that's looked at: "hey, it's a package". Then
> there's freeze again: a frozen package has a __path__ variable, but
> it's not a list, it's a *string*. Only when an import goes through a
> sys.path item, __path__ is (more or less) guaranteed to be a list.

Yeah, frozen code can't work with pkgutil as it currently stands.  But
I think pkgutil is pretty meaningless for frozen code: pkgutil is a
mechanism to allow extensions of a package to be installed in
different directories.  A frozen application should gather all
code belonging to a package in a single directory, and freeze that;
then all we need to ensure is that calling pkgutil.extend_path()
doesn't bomb out, and that's easily done by adding

  if not isinstance(path, list):
      return path

to the start of extend_path().

> The sys.meta_path import hook mechanism in my patch (the idea is
> stolen from Gordon McMillan) acts on the same level as builtin
> module imports and frozen module imports: it doesn't need
> sys.path. So it doesn't need any meaningful object as pkg.__path__
> either. I just uploaded a new version of the patch; it now contains
> a test_importhooks.py script, which has a sys.meta_path test case
> which actually sets pkg.__path__ to None. Works like a charm. Here's
> a (slightly modified) comment from the test script:

Does the metahook also apply to submodules (or subpackage) of
packages?  I'd expect not.  (I haven't had the time to review your
latest patches yet, in part due to this thread. :-)  Surely you
shouldn't be looking for builtin submodules of a package.

>     Depending on the kind of importer, there are different
>     levels of freedom of what you can use as pkg.__path__.
>     
>     Importer object on sys.meta_path:
>         it can use anything it pleases (even None), as long
>         as a __path__ variable is set.

But I can imagine that some metahooks would like to look inside the
__path__ list for more hints on where to find the submodules of the
package.  Of course that's up to the metahook.

>     Importer object on sys.path:
>         pkg.__path__ must be a list; it's most logical to use
>         an importer object as the only item. Could be the same
>         importer instance that imported the package itself.

Multiple items would make sense too.

>     A hook on sys.path_hooks:
>         pkg.__path__ must be a list and its only item should
>         be a string that the hook can handle itself.

IMO it should allow multiple strings too!

>     These are just guidelines: a set of hooks could in theory
>     deliberately set pgk.__path__ up so submodule imports be
>     handled by an entirely different importer. Not sure how
>     useful that would be...

Very useful, for not-yet-imagined cases.

--Guido van Rossum (home page: http://www.python.org/~guido/)