[Zope3-dev] Fwd: Prototype setuptools-specific PyPI index.

Jim Fulton jim at zope.com
Thu Jul 19 16:10:54 EDT 2007


See the forwarded message. I just added the following in the buildout  
section of my ~/.buildout/default.cfg:

index = http://download.zope.org/ppix

Without it, refreshing a small buildout of mine takes 2m44s. With it,  
it takes about 15 seconds.

Jim

Begin forwarded message:

> From: Jim Fulton <jim at zope.com>
> Date: July 19, 2007 7:06:34 AM EDT
> To: Distutils-Sig at python.org, catalog-sig at python.org
> Subject: Prototype setuptools-specific PyPI index.
>
> Over the past few months, we've struggled quite a bit with Python  
> Package Index (PyPI) performance and stability.  Thanks to the  
> heroic efforts of Martin v. Löwis and others, performance and  
> especially stability have improved quite a bit. Martin has  
> demonstrated that, at least when running well, PyPI seems to answer  
> most requests on the order of 7 miliseconds (around 150 requests  
> per second) internally.  That's not bad.  Unfortunately for users,  
> actual times can be quite a bit longer.  For me at work, request  
> take around 300 milliseconds.  For Martin, they seem to take  
> somewhat longer.  300 milliseconds isn't so bad for a request or  
> two, however, easy install can easily make 10s or even hundreds of  
> requests to satisfy a user request for a package.  zc.buildout,  
> when verifying that a large system with many tens of packages has  
> the most up to date versions of each package can easily make  
> thousands of requests.
>
> Why do setuptools and buildout make so many requests?  If a package  
> exposes more than one release, then setuptools checks the package's  
> main PyPI page and the pages for each release.  We need to be able  
> to easily use older releases, so we can't hide old releases.   
> Typical projects of ours have many old releases exposed.  If  
> setuptools was more clever in the way it searched PyPI, but it  
> would still have to make a minimum of 2 requests per package for  
> packages with multiple versions exposed.
>
> Another potential issue is that PyPI pages can be large.  I've  
> found it convenient to use PyPI package pages as the home page for  
> many of my projects.  I like to include package documentation in my  
> project pages.  Perhaps this is an abuse of PyPI, but it is very  
> convenient for me and no one has complained. :)  The zc.buildout  
> pages are around 200K.  That's a fair bit of data for setuptools to  
> download and scan for download URLs.
>
> In the course of this discussion, I've realized that it doesn't  
> make sense for setuptools to use the same interface that humans  
> use.  setuptools doesn't need to see all of the data that is useful  
> to humans. Similarly, humans generally don't need to see all of the  
> historical releases for a project.  I suggested a simple page  
> format designed just for setuptools.  An alternative would be an  
> xmlrpc API.  I prefer pages because I think that, over time, the  
> amount of requests from automated tools like easy_install and  
> zc.buildout will increase substantially and ultimately, will  
> overwhelm dynamic servers, even ones like PyPI that are reasonably  
> fast.  I also think that a simple static collection of pages will  
> be easier to mirror and I think some number of geographic mirrors  
> is likely to help some people.  I promised to prototype the format  
> I suggested.
>
> I've created and experimental prototype setuptools-specific package  
> index at
>
>   http://download.zope.org/ppix
>
> Going to that page gives brief instructions for using it with  
> easy_install and zc.buildout.  To see an individual package page,  
> add the package name to the URL, as in:
>
>   http://download.zope.org/ppix/setuptools/
>
> A few things to note about this:
>
> - I don't expose a long package list at http://download.zope.org/ 
> ppix/.  The long package list would be expensive to download and  
> supports a use case that I consider to be of negative value, which  
> is installing packages with case-insensitive package names,  I  
> think it is important for humans to be able to search for packages  
> using case-insensitive search terms, but I think that, after  
> identifying a package, precise package names should be used.  I  
> think it is especially important that precise package names be used  
> in package requirements.
>
> - There is a single page per package.  This can greatly reduce the  
> number of requests.  Packages that store all of their distributions  
> in PyPI and that don't have off-site home pages or download URLs  
> can be scanned with a single request.  Note that I excluded home  
> page and download URLs that pointed back to the packages PyPI page,  
> as that wouldn't provide any new information to setuptools.
>
> - Download URLs for *hidden* packages are included.  Humans don't  
> need to see old revisions, but setuptools-based tools do.  If we  
> used an index like this for setuptools, we could stop unhiding old  
> releases when we created new releases in PyPI.  This would make  
> PyPI more useful to humans and less of a pain for developers.
>
> - Download URLs are the same as they are in PyPI.  Using this new  
> index, distributions are still downloaded from PyPI, so the index  
> doesn't affect PyPI download statistics.
>
> To see the impact of this, it's interesting to look at installing  
> zc.buildout using easy_install from PyPI and from the experimental  
> index:
> Installing using PyPI looks like this:
>
>   (env)jim at ds9:~/tmp$ time easy_install zc.buildout
>   Searching for zc.buildout
>   Reading http://cheeseshop.python.org/pypi/zc.buildout/
>   Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b19
>   Reading http://svn.zope.org/zc.buildout
>   Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b22
>   Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b23
>   Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b20
>   Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b21
>   Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b26
>   Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b27
>   Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b24
>   Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b25
>   Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b28
>   Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b17
>   Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b16
>   Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b18
>   Best match: zc.buildout 1.0.0b28
>   Downloading http://cheeseshop.python.org/packages/2.5/z/ 
> zc.buildout/zc.buildout-1.0.0b28- 
> py2.5.egg#md5=4e37e53f010ed7984555a029732f479d
>   Processing zc.buildout-1.0.0b28-py2.5.egg
>   creating /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28- 
> py2.5.egg
>   Extracting zc.buildout-1.0.0b28-py2.5.egg to /home/jim/tmp/env/ 
> lib/python2.5
>   Adding zc.buildout 1.0.0b28 to easy-install.pth file
>   Installing buildout script to /home/jim/tmp/env/bin/
>
>   Installed /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28- 
> py2.5.egg
>   Processing dependencies for zc.buildout
>   Searching for setuptools==0.6c6
>   Best match: setuptools 0.6c6
>   Processing setuptools-0.6c6-py2.5.egg
>   Adding setuptools 0.6c6 to easy-install.pth file
>   Installing easy_install script to /home/jim/tmp/env/bin/
>   Installing easy_install-2.5 script to /home/jim/tmp/env/bin/
>
>   Installed /home/jim/tmp/env/lib/python2.5/setuptools-0.6c6-py2.5.egg
>   Processing dependencies for setuptools==0.6c6
>   Finished processing dependencies for setuptools==0.6c6
>   Finished installing setuptools==0.6c6
>   Finished processing dependencies for zc.buildout
>   Finished installing zc.buildout
>
>   real	0m31.360s
>   user	0m1.136s
>   sys	0m0.060s
>
> Note the large number of pages read.  Here I was installing a  
> single package with one dependency, setuptools, that was already  
> installed. Let's look at this again using the experimental index:
>
>   (env)jim at ds9:~/tmp$ time easy_install -i http://download.zope.org/ 
> ppix zc.buildout
>   Searching for zc.buildout
>   Reading http://download.zope.org/ppix/zc.buildout/
>   Best match: zc.buildout 1.0.0b28
>   Downloading http://cheeseshop.python.org/packages/2.5/z/ 
> zc.buildout/zc.buildout-1.0.0b28- 
> py2.5.egg#md5=4e37e53f010ed7984555a029732f479d
>   Processing zc.buildout-1.0.0b28-py2.5.egg
>   creating /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28- 
> py2.5.egg
>   Extracting zc.buildout-1.0.0b28-py2.5.egg to /home/jim/tmp/env/ 
> lib/python2.5
>   Adding zc.buildout 1.0.0b28 to easy-install.pth file
>   Installing buildout script to /home/jim/tmp/env/bin/
>
>   Installed /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28- 
> py2.5.egg
>   Processing dependencies for zc.buildout
>   Searching for setuptools==0.6c6
>   Best match: setuptools 0.6c6
>   Processing setuptools-0.6c6-py2.5.egg
>   Adding setuptools 0.6c6 to easy-install.pth file
>   Installing easy_install script to /home/jim/tmp/env/bin/
>   Installing easy_install-2.5 script to /home/jim/tmp/env/bin/
>
>   Installed /home/jim/tmp/env/lib/python2.5/setuptools-0.6c6-py2.5.egg
>   Processing dependencies for setuptools==0.6c6
>   Finished processing dependencies for setuptools==0.6c6
>   Finished installing setuptools==0.6c6
>   Finished processing dependencies for zc.buildout
>   Finished installing zc.buildout
>
>   real	0m7.006s
>   user	0m0.244s
>   sys	0m0.040s
>
> Note:
>
> - We made far fewer requests with the new index
>
> - Most of the time in the second example was spent actually  
> downloading the buildout distribution.  Most of the time in the  
> first example was spent reading the index.
>
> - I used workingenv to create clean environments for each of the  
> examples above.
>
> WRT zc.buildout, refreshing a buildout with just ZODB installed in  
> it takes about 45 seconds for me using PyPI and about 5 seconds  
> using the experimental index.
>
> Some of the speed improvements is due to the fact that the  
> experimental index is much closer to me (on the net) than PyPI.   
> ATM, requests to PyPI take *me* around 500 milliseconds, while  
> requests to the experimental index are taking between 100 and 300  
> milliseconds. (I'm at home and this seems to be somewhat  
> variable.)  Most of the speed improvements are from reducing the  
> number of requests.
>
> I'm polling PyPI once a minute to get and apply updates. Thanks to  
> the new XML-RPC method that Martin added, this is very efficient to  
> do.
>
> I encourage people to check this out and even try using it with  
> easy_install and especially buildout. AFAIK, aside from being much  
> faster and showing download files for hidden releases it is  
> completely equivalent to PyPI for setuptools use.  My intension is  
> to keep this experimental index going and up to date for the  
> foreseeable future and plan to use it for all my work.
>
> My primary goal is to prototype the new index format.  If this  
> seems useful, then I think that www.python.org should expose an  
> index in this format to setuptools, either at a different URL or by  
> satisfying setuptools requests from the index based on client  
> information.  I'd love to see this index populated via a baking  
> mechanism that updates package pages when they change, rather than  
> through polling as I'm doing.
>
> There would be some benefit to having geographic mirrors.  I  
> suspect that having such mirrors available would improve  
> performance further, at least for some folks.  It might also be  
> useful to have some mirrors for redundancy purposes.  Note though  
> that what I'm doing is mirroring the only index data. I'm not  
> mirroring distributions.  Of course, I'd be happy to make my  
> software available. (It already is via our subversion repository.)
>
> I hope this effort spurs useful discussion and progress.
>
> Jim
>
> --
> Jim Fulton			mailto:jim at zope.com		Python Powered!
> CTO 				(540) 361-1714			http://www.python.org
> Zope Corporation	http://www.zope.com		http://www.zope.org
>
>
>

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org





More information about the Zope3-dev mailing list