[Checkins] SVN: zc.buildout/branches/tlotze-download-hard-links/ implemented new 'shared' parameter for calls to the download utility, tests need to be made pass on all relevant platforms
Thomas Lotze
tl at gocept.com
Wed Mar 2 10:48:00 EST 2011
Log message for revision 120677:
implemented new 'shared' parameter for calls to the download utility, tests need to be made pass on all relevant platforms
Changed:
U zc.buildout/branches/tlotze-download-hard-links/CHANGES.txt
U zc.buildout/branches/tlotze-download-hard-links/src/zc/buildout/download.py
U zc.buildout/branches/tlotze-download-hard-links/src/zc/buildout/download.txt
-=-
Modified: zc.buildout/branches/tlotze-download-hard-links/CHANGES.txt
===================================================================
--- zc.buildout/branches/tlotze-download-hard-links/CHANGES.txt 2011-03-02 14:06:21 UTC (rev 120676)
+++ zc.buildout/branches/tlotze-download-hard-links/CHANGES.txt 2011-03-02 15:48:00 UTC (rev 120677)
@@ -15,6 +15,12 @@
- Made sure to download extended configuration files only once per buildout
run even if they are referenced multiple times (patch by Rafael Monnerat).
+- Added a new keyword argument, ``shared``, to calling the download utility.
+ This makes the optimisation of hard-linking the downloaded resource between
+ the cache and the download target more explicit and changes the default
+ behaviour to creating copies of files which is safer as it isolates
+ buildouts better from the cache and from each other.
+
Bugs fixed:
- In the download module, fixed the handling of directories that are pointed
Modified: zc.buildout/branches/tlotze-download-hard-links/src/zc/buildout/download.py
===================================================================
--- zc.buildout/branches/tlotze-download-hard-links/src/zc/buildout/download.py 2011-03-02 14:06:21 UTC (rev 120676)
+++ zc.buildout/branches/tlotze-download-hard-links/src/zc/buildout/download.py 2011-03-02 15:48:00 UTC (rev 120677)
@@ -1,6 +1,6 @@
##############################################################################
#
-# Copyright (c) 2009 Zope Foundation and Contributors.
+# Copyright (c) 2009-2011 Zope Corporation and Contributors.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
@@ -83,29 +83,33 @@
if self.download_cache is not None:
return os.path.join(self.download_cache, self.namespace or '')
- def __call__(self, url, md5sum=None, path=None):
+ def __call__(self, url, md5sum=None, path=None, shared=False):
"""Download a file according to the utility's configuration.
url: URL to download
md5sum: MD5 checksum to match
path: where to place the downloaded file
+ shared: whether to attempt hard-linking multiple copies of the
+ resource in the file system (cached copy, target path etc)
Returns the path to the downloaded file.
"""
if self.cache:
- local_path, is_temp = self.download_cached(url, md5sum)
+ local_path, is_temp = self.download_cached(url, md5sum, shared)
else:
- local_path, is_temp = self.download(url, md5sum, path)
+ local_path, is_temp = self.download(url, md5sum, path, shared)
- return locate_at(local_path, path), is_temp
+ return locate_at(local_path, path, shared), is_temp
- def download_cached(self, url, md5sum=None):
+ def download_cached(self, url, md5sum=None, shared=False):
"""Download a file from a URL using the cache.
This method assumes that the cache has been configured. Optionally, it
raises a ChecksumError if a cached copy of a file has an MD5 mismatch,
- but will not remove the copy in that case.
+ but will not remove the copy in that case. If the resource comes from
+ the file system or shall be stored at a target path, an optimisation
+ may be attempted to share the file instead of copying it.
"""
if not os.path.exists(self.download_cache):
@@ -125,7 +129,8 @@
is_temp = False
if self.fallback:
try:
- _, is_temp = self.download(url, md5sum, cached_path)
+ _, is_temp = self.download(
+ url, md5sum, cached_path, shared)
except ChecksumError:
raise
except Exception:
@@ -139,17 +144,19 @@
else:
self.logger.debug('Cache miss; will cache %s as %s' %
(url, cached_path))
- _, is_temp = self.download(url, md5sum, cached_path)
+ _, is_temp = self.download(url, md5sum, cached_path, shared)
return cached_path, is_temp
- def download(self, url, md5sum=None, path=None):
+ def download(self, url, md5sum=None, path=None, shared=False):
"""Download a file from a URL to a given or temporary path.
An online resource is always downloaded to a temporary file and moved
to the specified path only after the download is complete and the
checksum (if given) matches. If path is None, the temporary file is
- returned and the client code is responsible for cleaning it up.
+ returned and the client code is responsible for cleaning it up. If the
+ resource comes from the file system, an optimisation may be attempted
+ to share the existing file instead of copying it.
"""
# Make sure the drive letter in windows-style file paths isn't
@@ -165,7 +172,7 @@
raise ChecksumError(
'MD5 checksum mismatch for local resource at %r.' %
url_path)
- return locate_at(url_path, path), False
+ return locate_at(url_path, path, shared), False
if self.offline:
raise zc.buildout.UserError(
@@ -246,15 +253,20 @@
os.remove(path)
-def locate_at(source, dest):
+def locate_at(source, dest, shared):
if dest is None or realpath(dest) == realpath(source):
return source
if os.path.isdir(source):
shutil.copytree(source, dest)
- else:
+ elif shared:
try:
+ if os.path.exists(dest):
+ os.unlink(dest)
os.link(source, dest)
except (AttributeError, OSError):
shutil.copyfile(source, dest)
+ else:
+ shutil.copyfile(source, dest)
+
return dest
Modified: zc.buildout/branches/tlotze-download-hard-links/src/zc/buildout/download.txt
===================================================================
--- zc.buildout/branches/tlotze-download-hard-links/src/zc/buildout/download.txt 2011-03-02 14:06:21 UTC (rev 120676)
+++ zc.buildout/branches/tlotze-download-hard-links/src/zc/buildout/download.txt 2011-03-02 15:48:00 UTC (rev 120677)
@@ -445,7 +445,54 @@
>>> cat(cache, 'foo.txt')
The wrong text.
+Clean up:
+>>> remove(cache, 'foo.txt')
+
+
+Using shared copies of a downloaded resource
+--------------------------------------------
+
+When downloading large files and using both the cache and a download target,
+it may be desirable to avoid creating multiple copies of the same resource in
+the file system and rather save disk space by employing hard links. The same
+is true for "downloading" file-system resources, either to the cache or a
+download target. It should be noted that the download utility can only attempt
+to employ hard links; if it isn't possible for some reason such as missing
+support for hard links by the OS or paths to be hard-linked resolving to
+different storage devices, plain copies of the resource will be made instead.
+
+As using hard links results in the same files on disk being shared by more
+than one buildout, the download utility copies files by default and leaves it
+up to client code to suggest that it is safe to attempt the hard-link
+optimisation. If the client code (e.g., a recipe) is going to modify the
+downloaded file, the default behaviour (no optimisation) is appropriate to
+avoid trashing the cached copy of the resource:
+
+>>> download = Download(cache=cache)
+>>> path = join(target_dir, 'downloaded.txt')
+>>> cat(download(server_url+'foo.txt', path=path)[0])
+This is a foo text.
+
+>>> write(path, 'garbage')
+>>> cat(cache, 'foo.txt')
+This is a foo text.
+
+If, on the other hand, the client code knows for sure that the copy of the
+resource it is downloading will only ever be read, the optimisation may be
+attempted. Modifying the downloaded file for demonstration purposes will
+clobber the cached (and thereby shared) copy:
+
+>>> cat(download(server_url+'foo.txt', path=path, shared=True)[0])
+This is a foo text.
+
+# XXX make this test pass on all OS buildout cares about
+
+>>> write(path, 'garbage')
+>>> cat(cache, 'foo.txt')
+garbage
+
+
Configuring the download utility from buildout options
------------------------------------------------------
More information about the checkins
mailing list