[Checkins] SVN: zc.buildout/branches/tlotze-download-hard-links/ implemented new 'shared' parameter for calls to the download utility, tests need to be made pass on all relevant platforms

Thomas Lotze tl at gocept.com
Wed Mar 2 10:48:00 EST 2011


Log message for revision 120677:
  implemented new 'shared' parameter for calls to the download utility, tests need to be made pass on all relevant platforms

Changed:
  U   zc.buildout/branches/tlotze-download-hard-links/CHANGES.txt
  U   zc.buildout/branches/tlotze-download-hard-links/src/zc/buildout/download.py
  U   zc.buildout/branches/tlotze-download-hard-links/src/zc/buildout/download.txt

-=-
Modified: zc.buildout/branches/tlotze-download-hard-links/CHANGES.txt
===================================================================
--- zc.buildout/branches/tlotze-download-hard-links/CHANGES.txt	2011-03-02 14:06:21 UTC (rev 120676)
+++ zc.buildout/branches/tlotze-download-hard-links/CHANGES.txt	2011-03-02 15:48:00 UTC (rev 120677)
@@ -15,6 +15,12 @@
 - Made sure to download extended configuration files only once per buildout
   run even if they are referenced multiple times (patch by Rafael Monnerat).
 
+- Added a new keyword argument, ``shared``, to calling the download utility.
+  This makes the optimisation of hard-linking the downloaded resource between
+  the cache and the download target more explicit and changes the default
+  behaviour to creating copies of files which is safer as it isolates
+  buildouts better from the cache and from each other.
+
 Bugs fixed:
 
 - In the download module, fixed the handling of directories that are pointed

Modified: zc.buildout/branches/tlotze-download-hard-links/src/zc/buildout/download.py
===================================================================
--- zc.buildout/branches/tlotze-download-hard-links/src/zc/buildout/download.py	2011-03-02 14:06:21 UTC (rev 120676)
+++ zc.buildout/branches/tlotze-download-hard-links/src/zc/buildout/download.py	2011-03-02 15:48:00 UTC (rev 120677)
@@ -1,6 +1,6 @@
 ##############################################################################
 #
-# Copyright (c) 2009 Zope Foundation and Contributors.
+# Copyright (c) 2009-2011 Zope Corporation and Contributors.
 # All Rights Reserved.
 #
 # This software is subject to the provisions of the Zope Public License,
@@ -83,29 +83,33 @@
         if self.download_cache is not None:
             return os.path.join(self.download_cache, self.namespace or '')
 
-    def __call__(self, url, md5sum=None, path=None):
+    def __call__(self, url, md5sum=None, path=None, shared=False):
         """Download a file according to the utility's configuration.
 
         url: URL to download
         md5sum: MD5 checksum to match
         path: where to place the downloaded file
+        shared: whether to attempt hard-linking multiple copies of the
+                resource in the file system (cached copy, target path etc)
 
         Returns the path to the downloaded file.
 
         """
         if self.cache:
-            local_path, is_temp = self.download_cached(url, md5sum)
+            local_path, is_temp = self.download_cached(url, md5sum, shared)
         else:
-            local_path, is_temp = self.download(url, md5sum, path)
+            local_path, is_temp = self.download(url, md5sum, path, shared)
 
-        return locate_at(local_path, path), is_temp
+        return locate_at(local_path, path, shared), is_temp
 
-    def download_cached(self, url, md5sum=None):
+    def download_cached(self, url, md5sum=None, shared=False):
         """Download a file from a URL using the cache.
 
         This method assumes that the cache has been configured. Optionally, it
         raises a ChecksumError if a cached copy of a file has an MD5 mismatch,
-        but will not remove the copy in that case.
+        but will not remove the copy in that case. If the resource comes from
+        the file system or shall be stored at a target path, an optimisation
+        may be attempted to share the file instead of copying it.
 
         """
         if not os.path.exists(self.download_cache):
@@ -125,7 +129,8 @@
             is_temp = False
             if self.fallback:
                 try:
-                    _, is_temp = self.download(url, md5sum, cached_path)
+                    _, is_temp = self.download(
+                        url, md5sum, cached_path, shared)
                 except ChecksumError:
                     raise
                 except Exception:
@@ -139,17 +144,19 @@
         else:
             self.logger.debug('Cache miss; will cache %s as %s' %
                               (url, cached_path))
-            _, is_temp = self.download(url, md5sum, cached_path)
+            _, is_temp = self.download(url, md5sum, cached_path, shared)
 
         return cached_path, is_temp
 
-    def download(self, url, md5sum=None, path=None):
+    def download(self, url, md5sum=None, path=None, shared=False):
         """Download a file from a URL to a given or temporary path.
 
         An online resource is always downloaded to a temporary file and moved
         to the specified path only after the download is complete and the
         checksum (if given) matches. If path is None, the temporary file is
-        returned and the client code is responsible for cleaning it up.
+        returned and the client code is responsible for cleaning it up. If the
+        resource comes from the file system, an optimisation may be attempted
+        to share the existing file instead of copying it.
 
         """
         # Make sure the drive letter in windows-style file paths isn't
@@ -165,7 +172,7 @@
                 raise ChecksumError(
                     'MD5 checksum mismatch for local resource at %r.' %
                     url_path)
-            return locate_at(url_path, path), False
+            return locate_at(url_path, path, shared), False
 
         if self.offline:
             raise zc.buildout.UserError(
@@ -246,15 +253,20 @@
         os.remove(path)
 
 
-def locate_at(source, dest):
+def locate_at(source, dest, shared):
     if dest is None or realpath(dest) == realpath(source):
         return source
 
     if os.path.isdir(source):
         shutil.copytree(source, dest)
-    else:
+    elif shared:
         try:
+            if os.path.exists(dest):
+                os.unlink(dest)
             os.link(source, dest)
         except (AttributeError, OSError):
             shutil.copyfile(source, dest)
+    else:
+        shutil.copyfile(source, dest)
+
     return dest

Modified: zc.buildout/branches/tlotze-download-hard-links/src/zc/buildout/download.txt
===================================================================
--- zc.buildout/branches/tlotze-download-hard-links/src/zc/buildout/download.txt	2011-03-02 14:06:21 UTC (rev 120676)
+++ zc.buildout/branches/tlotze-download-hard-links/src/zc/buildout/download.txt	2011-03-02 15:48:00 UTC (rev 120677)
@@ -445,7 +445,54 @@
 >>> cat(cache, 'foo.txt')
 The wrong text.
 
+Clean up:
 
+>>> remove(cache, 'foo.txt')
+
+
+Using shared copies of a downloaded resource
+--------------------------------------------
+
+When downloading large files and using both the cache and a download target,
+it may be desirable to avoid creating multiple copies of the same resource in
+the file system and rather save disk space by employing hard links. The same
+is true for "downloading" file-system resources, either to the cache or a
+download target. It should be noted that the download utility can only attempt
+to employ hard links; if it isn't possible for some reason such as missing
+support for hard links by the OS or paths to be hard-linked resolving to
+different storage devices, plain copies of the resource will be made instead.
+
+As using hard links results in the same files on disk being shared by more
+than one buildout, the download utility copies files by default and leaves it
+up to client code to suggest that it is safe to attempt the hard-link
+optimisation. If the client code (e.g., a recipe) is going to modify the
+downloaded file, the default behaviour (no optimisation) is appropriate to
+avoid trashing the cached copy of the resource:
+
+>>> download = Download(cache=cache)
+>>> path = join(target_dir, 'downloaded.txt')
+>>> cat(download(server_url+'foo.txt', path=path)[0])
+This is a foo text.
+
+>>> write(path, 'garbage')
+>>> cat(cache, 'foo.txt')
+This is a foo text.
+
+If, on the other hand, the client code knows for sure that the copy of the
+resource it is downloading will only ever be read, the optimisation may be
+attempted. Modifying the downloaded file for demonstration purposes will
+clobber the cached (and thereby shared) copy:
+
+>>> cat(download(server_url+'foo.txt', path=path, shared=True)[0])
+This is a foo text.
+
+# XXX make this test pass on all OS buildout cares about
+
+>>> write(path, 'garbage')
+>>> cat(cache, 'foo.txt')
+garbage
+
+
 Configuring the download utility from buildout options
 ------------------------------------------------------
 



More information about the checkins mailing list