[Zope3-dev] Fwd: Prototype setuptools-specific PyPI index.
Jim Fulton
jim at zope.com
Thu Jul 19 16:10:54 EDT 2007
See the forwarded message. I just added the following in the buildout
section of my ~/.buildout/default.cfg:
index = http://download.zope.org/ppix
Without it, refreshing a small buildout of mine takes 2m44s. With it,
it takes about 15 seconds.
Jim
Begin forwarded message:
> From: Jim Fulton <jim at zope.com>
> Date: July 19, 2007 7:06:34 AM EDT
> To: Distutils-Sig at python.org, catalog-sig at python.org
> Subject: Prototype setuptools-specific PyPI index.
>
> Over the past few months, we've struggled quite a bit with Python
> Package Index (PyPI) performance and stability. Thanks to the
> heroic efforts of Martin v. Löwis and others, performance and
> especially stability have improved quite a bit. Martin has
> demonstrated that, at least when running well, PyPI seems to answer
> most requests on the order of 7 miliseconds (around 150 requests
> per second) internally. That's not bad. Unfortunately for users,
> actual times can be quite a bit longer. For me at work, request
> take around 300 milliseconds. For Martin, they seem to take
> somewhat longer. 300 milliseconds isn't so bad for a request or
> two, however, easy install can easily make 10s or even hundreds of
> requests to satisfy a user request for a package. zc.buildout,
> when verifying that a large system with many tens of packages has
> the most up to date versions of each package can easily make
> thousands of requests.
>
> Why do setuptools and buildout make so many requests? If a package
> exposes more than one release, then setuptools checks the package's
> main PyPI page and the pages for each release. We need to be able
> to easily use older releases, so we can't hide old releases.
> Typical projects of ours have many old releases exposed. If
> setuptools was more clever in the way it searched PyPI, but it
> would still have to make a minimum of 2 requests per package for
> packages with multiple versions exposed.
>
> Another potential issue is that PyPI pages can be large. I've
> found it convenient to use PyPI package pages as the home page for
> many of my projects. I like to include package documentation in my
> project pages. Perhaps this is an abuse of PyPI, but it is very
> convenient for me and no one has complained. :) The zc.buildout
> pages are around 200K. That's a fair bit of data for setuptools to
> download and scan for download URLs.
>
> In the course of this discussion, I've realized that it doesn't
> make sense for setuptools to use the same interface that humans
> use. setuptools doesn't need to see all of the data that is useful
> to humans. Similarly, humans generally don't need to see all of the
> historical releases for a project. I suggested a simple page
> format designed just for setuptools. An alternative would be an
> xmlrpc API. I prefer pages because I think that, over time, the
> amount of requests from automated tools like easy_install and
> zc.buildout will increase substantially and ultimately, will
> overwhelm dynamic servers, even ones like PyPI that are reasonably
> fast. I also think that a simple static collection of pages will
> be easier to mirror and I think some number of geographic mirrors
> is likely to help some people. I promised to prototype the format
> I suggested.
>
> I've created and experimental prototype setuptools-specific package
> index at
>
> http://download.zope.org/ppix
>
> Going to that page gives brief instructions for using it with
> easy_install and zc.buildout. To see an individual package page,
> add the package name to the URL, as in:
>
> http://download.zope.org/ppix/setuptools/
>
> A few things to note about this:
>
> - I don't expose a long package list at http://download.zope.org/
> ppix/. The long package list would be expensive to download and
> supports a use case that I consider to be of negative value, which
> is installing packages with case-insensitive package names, I
> think it is important for humans to be able to search for packages
> using case-insensitive search terms, but I think that, after
> identifying a package, precise package names should be used. I
> think it is especially important that precise package names be used
> in package requirements.
>
> - There is a single page per package. This can greatly reduce the
> number of requests. Packages that store all of their distributions
> in PyPI and that don't have off-site home pages or download URLs
> can be scanned with a single request. Note that I excluded home
> page and download URLs that pointed back to the packages PyPI page,
> as that wouldn't provide any new information to setuptools.
>
> - Download URLs for *hidden* packages are included. Humans don't
> need to see old revisions, but setuptools-based tools do. If we
> used an index like this for setuptools, we could stop unhiding old
> releases when we created new releases in PyPI. This would make
> PyPI more useful to humans and less of a pain for developers.
>
> - Download URLs are the same as they are in PyPI. Using this new
> index, distributions are still downloaded from PyPI, so the index
> doesn't affect PyPI download statistics.
>
> To see the impact of this, it's interesting to look at installing
> zc.buildout using easy_install from PyPI and from the experimental
> index:
> Installing using PyPI looks like this:
>
> (env)jim at ds9:~/tmp$ time easy_install zc.buildout
> Searching for zc.buildout
> Reading http://cheeseshop.python.org/pypi/zc.buildout/
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b19
> Reading http://svn.zope.org/zc.buildout
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b22
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b23
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b20
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b21
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b26
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b27
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b24
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b25
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b28
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b17
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b16
> Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b18
> Best match: zc.buildout 1.0.0b28
> Downloading http://cheeseshop.python.org/packages/2.5/z/
> zc.buildout/zc.buildout-1.0.0b28-
> py2.5.egg#md5=4e37e53f010ed7984555a029732f479d
> Processing zc.buildout-1.0.0b28-py2.5.egg
> creating /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28-
> py2.5.egg
> Extracting zc.buildout-1.0.0b28-py2.5.egg to /home/jim/tmp/env/
> lib/python2.5
> Adding zc.buildout 1.0.0b28 to easy-install.pth file
> Installing buildout script to /home/jim/tmp/env/bin/
>
> Installed /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28-
> py2.5.egg
> Processing dependencies for zc.buildout
> Searching for setuptools==0.6c6
> Best match: setuptools 0.6c6
> Processing setuptools-0.6c6-py2.5.egg
> Adding setuptools 0.6c6 to easy-install.pth file
> Installing easy_install script to /home/jim/tmp/env/bin/
> Installing easy_install-2.5 script to /home/jim/tmp/env/bin/
>
> Installed /home/jim/tmp/env/lib/python2.5/setuptools-0.6c6-py2.5.egg
> Processing dependencies for setuptools==0.6c6
> Finished processing dependencies for setuptools==0.6c6
> Finished installing setuptools==0.6c6
> Finished processing dependencies for zc.buildout
> Finished installing zc.buildout
>
> real 0m31.360s
> user 0m1.136s
> sys 0m0.060s
>
> Note the large number of pages read. Here I was installing a
> single package with one dependency, setuptools, that was already
> installed. Let's look at this again using the experimental index:
>
> (env)jim at ds9:~/tmp$ time easy_install -i http://download.zope.org/
> ppix zc.buildout
> Searching for zc.buildout
> Reading http://download.zope.org/ppix/zc.buildout/
> Best match: zc.buildout 1.0.0b28
> Downloading http://cheeseshop.python.org/packages/2.5/z/
> zc.buildout/zc.buildout-1.0.0b28-
> py2.5.egg#md5=4e37e53f010ed7984555a029732f479d
> Processing zc.buildout-1.0.0b28-py2.5.egg
> creating /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28-
> py2.5.egg
> Extracting zc.buildout-1.0.0b28-py2.5.egg to /home/jim/tmp/env/
> lib/python2.5
> Adding zc.buildout 1.0.0b28 to easy-install.pth file
> Installing buildout script to /home/jim/tmp/env/bin/
>
> Installed /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28-
> py2.5.egg
> Processing dependencies for zc.buildout
> Searching for setuptools==0.6c6
> Best match: setuptools 0.6c6
> Processing setuptools-0.6c6-py2.5.egg
> Adding setuptools 0.6c6 to easy-install.pth file
> Installing easy_install script to /home/jim/tmp/env/bin/
> Installing easy_install-2.5 script to /home/jim/tmp/env/bin/
>
> Installed /home/jim/tmp/env/lib/python2.5/setuptools-0.6c6-py2.5.egg
> Processing dependencies for setuptools==0.6c6
> Finished processing dependencies for setuptools==0.6c6
> Finished installing setuptools==0.6c6
> Finished processing dependencies for zc.buildout
> Finished installing zc.buildout
>
> real 0m7.006s
> user 0m0.244s
> sys 0m0.040s
>
> Note:
>
> - We made far fewer requests with the new index
>
> - Most of the time in the second example was spent actually
> downloading the buildout distribution. Most of the time in the
> first example was spent reading the index.
>
> - I used workingenv to create clean environments for each of the
> examples above.
>
> WRT zc.buildout, refreshing a buildout with just ZODB installed in
> it takes about 45 seconds for me using PyPI and about 5 seconds
> using the experimental index.
>
> Some of the speed improvements is due to the fact that the
> experimental index is much closer to me (on the net) than PyPI.
> ATM, requests to PyPI take *me* around 500 milliseconds, while
> requests to the experimental index are taking between 100 and 300
> milliseconds. (I'm at home and this seems to be somewhat
> variable.) Most of the speed improvements are from reducing the
> number of requests.
>
> I'm polling PyPI once a minute to get and apply updates. Thanks to
> the new XML-RPC method that Martin added, this is very efficient to
> do.
>
> I encourage people to check this out and even try using it with
> easy_install and especially buildout. AFAIK, aside from being much
> faster and showing download files for hidden releases it is
> completely equivalent to PyPI for setuptools use. My intension is
> to keep this experimental index going and up to date for the
> foreseeable future and plan to use it for all my work.
>
> My primary goal is to prototype the new index format. If this
> seems useful, then I think that www.python.org should expose an
> index in this format to setuptools, either at a different URL or by
> satisfying setuptools requests from the index based on client
> information. I'd love to see this index populated via a baking
> mechanism that updates package pages when they change, rather than
> through polling as I'm doing.
>
> There would be some benefit to having geographic mirrors. I
> suspect that having such mirrors available would improve
> performance further, at least for some folks. It might also be
> useful to have some mirrors for redundancy purposes. Note though
> that what I'm doing is mirroring the only index data. I'm not
> mirroring distributions. Of course, I'd be happy to make my
> software available. (It already is via our subversion repository.)
>
> I hope this effort spurs useful discussion and progress.
>
> Jim
>
> --
> Jim Fulton mailto:jim at zope.com Python Powered!
> CTO (540) 361-1714 http://www.python.org
> Zope Corporation http://www.zope.com http://www.zope.org
>
>
>
--
Jim Fulton mailto:jim at zope.com Python Powered!
CTO (540) 361-1714 http://www.python.org
Zope Corporation http://www.zope.com http://www.zope.org
More information about the Zope3-dev
mailing list