[ZODB-Dev] Automatic ZODB packing

Greg Ward gward@mems-exchange.org
Wed, 16 May 2001 09:06:54 -0400


--/NkBOFFp2J2Af1nK
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On 16 May 2001, Herbert Kwong said:
> Is there any external method or Zope products that
> enable the automatic packing of ZODB when it reaches
> certain size or at a certain date?  The size of my
> ZODB seems to grow quite fast so I think it is safer
> to pack
> it before it becomes really too large.

cron is your friend.  We have cron job that runs nightly and packs the
database, and makes a gzip'ped backup of the pre-pack "old" file.  From
the crontab:

  0 2 * * 1,2,3,4,5 /www/mxpython/mems/scripts/save_db.py

I'll attach the script, but keep the following things in mind:
  * this is not a Zope installation, but ZODB used on its own
  * we use ZEO since we need concurrent access to the database --
    ie. the web site that sits on top of this ZODB keeps
    running while the pack/backup proceeds
  * the script makes certain assumptions about the behaviour
    of packing a FileStorage through ZEO -- see comments in the script
  * you'll have to change some hard-coded paths
  * you might not want the gzip'ed nightly backup -- easy enough
    to remove that code
  * you could of course modify the script to put in a size check
    of the database file, and skip the pack/backup if it's smaller
    than your threshold

        Greg
-- 
Greg Ward - software developer                gward@mems-exchange.org
MEMS Exchange                            http://www.mems-exchange.org

--/NkBOFFp2J2Af1nK
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="save_db.py"

#!/www/python/bin/python 

"""save_db

This script will:
1) check to see if there is enough disk space to copy the relevant files
2) pack zeodb
3) copy mxdb.fs.old to a backup directory

This script needs to run as root since it kills zeo and restarts
zeo and kills quixote to restart it
Intended to be run as a cron job
"""
# created 2001/03/28, EAO

__revision__ = "$Id: save_db.py,v 1.9 2001/05/15 14:26:24 gward Exp $"

import sys, os, string, time, shutil
import traceback, StringIO
import getopt
from mems.lib import base

USAGE = """\
usage: %s [options]

save_db: pack the database and copy the backup .old file.

options:

-h, --help     Display this help message
-v, --verbose  Verbose mode - print status to log
"""

# File to be backed up
SOURCE_FILE_PATH = '/www/var/mxdb.fs'

# This script assumes that FileStorage is being used, and will need
# significant surgery if we switch to another Storage class.  Here's
# one dependency: after packing a FileStorage, the old (unpacked) file
# is copied to mxdb.fs.old, and this is the file that we'll actually
# copy.  It's also assumed that the existence of mxdb.fs.old indicates
# that the pack is complete (currently a correct assumption) and that
# it's OK to delete the .old file.

OLD_SOURCE_FILE = SOURCE_FILE_PATH + '.old'

# Directory where the backups will be written
TARGET_DIR = '/www/var/backup'

# don't copy database if number of free bytes less than FREE_DISK_SPACE after
# the copy
FREE_DISK_SPACE = 5*1024*1024 # 5Mb

# Number of minutes to wait for the pack to be completed.
WAIT_TIME = 5

LOG_FILE = '/www/log/backup.log'

class Options:
    def __init__ (self):
        self.help = 0
        self.verbose = 0

def log (msg, threshold=1, verbose=1):
    """Output a message to stdout with a timestamp prefix (but only
    if verbose >= threshold).
    """
    if verbose >= threshold:
        timestamp = time.strftime("[%Y-%m-%d %H:%M:%S] ",
                                  time.localtime(time.time()))
        sys.stdout.write(timestamp + msg + '\n')

def die (msg):
    sys.exit("save_db: error: " + msg + " (database not backed up)\n")
    

def main (prog, args):
    """get the options, check for free disk space, stop zeo, copy the database,
    restart zeo, pack zeo, restart quixote
    """
    usage = USAGE % prog

    # get new instances of the Options classe
    options = Options()

    opt_map = { '-h': "help",
                '-v': "verbose",
                '--help' : "help",
                '--verbose' :"verbose",
              }

    # get options
    try:
        (opts, args) = getopt.getopt(args, "hv", ["help","verbose"])

    except getopt.error, msg:
        sys.exit(str(msg) + '\n\n' + usage)

    for (opt, val) in opts:
        attr = opt_map[opt]
        val = 1
        setattr(options, attr, val)

    # if help option, print out usage

    if options.help:
        print usage
        sys.exit(0)

    # Create backup directory if it doesn't exist
    if not os.path.exists(TARGET_DIR):
        import pwd, grp
        os.mkdir(TARGET_DIR)
        
    # check to see if we have enough disk space to do the copy first
    # (block size * blocks) - (2 * mxdb.fs size) (bytes)
    file_size = os.stat(SOURCE_FILE_PATH)[6]
    dstat = os.statvfs(TARGET_DIR)
    free_space = dstat[0] * long(dstat[3])
    if (free_space - 2*file_size) < FREE_DISK_SPACE:
        die('not enough spare disk space for copy: would leave only %s kB' %
            (free_space - file_size)/1024)

    # pack the database - get rid of object versions older than the
    # current time
    log('packing database...', 1, options.verbose)
    if os.path.exists(OLD_SOURCE_FILE):
        os.unlink(OLD_SOURCE_FILE)
        
    base.init_database()
    zodb = base.get_database()
    zodb.pack(time.time())
    base.close_database()

    # The pack() function is asynchronous; it returns immediately
    # while the pack continues running in a new thread within the ZEO server.
    # Therefore, we need to wait until the .old file is created, up to
    # a maximum of 5 minutes.
    
    secs = 0
    while not os.path.exists(OLD_SOURCE_FILE):
        secs += 1
        if secs > WAIT_TIME*60:
            break
        time.sleep(1)

    if not os.path.exists(OLD_SOURCE_FILE):
        die('%s was not created after waiting %s seconds ' %
            (OLD_SOURCE_FILE, secs))

    log('pack completed', 1, options.verbose)

    # generate timestamp to append to the backup filename
    timestamp = time.strftime("%Y%m%d%H%M%S", time.localtime(time.time()))
    basename = os.path.basename(SOURCE_FILE_PATH)
    target_file = os.path.join(TARGET_DIR,
                               "%s.%s.gz" % (basename, timestamp))

    # copy/compress the database
    log('gzipping %s to %s' % (OLD_SOURCE_FILE, target_file),
        1, options.verbose)
    cmd = "gzip -c %s > %s" % (OLD_SOURCE_FILE, target_file)
    status = os.system(cmd)
    if status != 0:
        die("gzip failed")
    log('backup completed', 1, options.verbose)


if __name__ == '__main__':
    main(sys.argv[0], sys.argv[1:])

--/NkBOFFp2J2Af1nK--