[Zope] Zope backup

sean.upton@uniontrib.com sean.upton@uniontrib.com
Fri, 28 Mar 2003 14:18:14 -0800


Copying a FileStorage while transactions are being appended to the end of it
potentially means that the copy completes with before the transaction commit
is totally flushed to the file, so there is always the possibility that your
backup is going to need to have half-written transactions manually truncated
to be used on either a replica or a new Zope/ZSS instance; this isn't that
big of a deal, but it is possible to avoid this manual work with dirstorage.

The only things that make DirectoryStorage better in this regard is that the
backup tools integrate with the storage instead of acting uninformed below
it, trigger snapshot mode, and get a list of files to backup from the
storage software itself (this is quicker and a better guarantee than, say,
using unix 'find' and mtime on a dirstorage directory to do the same thing).
Compared to FileStorage, you do not have the problem of backing up files
being written to because:

	(a) Snapshot mode prevents changes to an object in HOME/A from being
written to, buffering any writes to those files in HOME/journal and HOME/B
for later flush once snapshot mode is exited (post-backup).

	(b) Additional transactions and objects are not added to the
directory being backed up.

DirectoryStorage also is preferable in these backup scenarios:

1. Disaster-preparedness.  You want to backup a big storage over a WAN
connection - and this means incremental. You need incremental backup and
IIRC something like rsync may not work very well on a changing FileStorage
Data.fs. http://mail.zope.org/pipermail/zodb-dev/2002-November/003807.html
We run servers at a co-location facility, and need remote backup to our
facility over a 1.5Mb/s connection, and a reasonable way to do this is use
the backup.py tool to create full and incremental files locally that are
pulled down to remote locations via FTP on a cron job, or even better, just
run the replica.py tool from our secondary location to incrementally pull
down the changes (equiv. to backup.py incremental backup, but for replica
purposes) over SSH connection to our other location and to tape for standard
offsite backup rotations.  With FileStorage, we would have to use rsync
because of bandwidth constraints, and our ability to respond quickly would
be impeded by the fact that we may have to manually repair the remote copy
of the filestorage via truncation of half-committed transactions.

2. ZSS High-availability clustering and replication.  We have an HA cluster
currently using Linux-HA heartbeat, and our crude way of copying the Data.fs
is via FTP for daily snapshots in the middle of the night between our
primary and secondary node.  This works okay (not as well as rsync would)
because this application only updates most content once-daily.  However, if
you have a heavier-write situation, FileStorage will not be amicable to a
hot-backup clustering arrangement, because cluster software will not be able
to start the ZEO storage server on the backup/secondary node in the possible
case of a corrupted (even slightly) filestorage copy (someone correct me if
I am wrong here).  The DirectoryStorage replica.py tool addresses this by
providing a secure network-enabled incremental replication mechanism that
ignores incoming writes (via snapshot) to guarantee consistency and
isolation (in a transactional sense) for the backup operation: the backup is
consistent with the state of the storage at the point in time the snapshot
mode was entered (when backup started), and incoming transactions do not
effect the operation of a backup because they are isolate in HOME/journal
and HOME/B while stuff is copied out of HOME/A.  Given this, I feel much
more comfortable that I can keep a 'hot' replica on a 'hot' backup node that
is ready to take over as ZSS in the case of a failure on the primary or
(mainly) the need for maintenance on the primary - and I can feel
comfortable that my backup/replica reflects a recent consistent record of
current heavy activity.

Sean

-----Original Message-----
From: Chris Withers [mailto:chrisw@nipltd.com]
Sent: Friday, March 28, 2003 1:05 PM
To: sean.upton@uniontrib.com
Cc: jccooper@jcameroncooper.com; zope@zope.org
Subject: Re: [Zope] Zope backup


sean.upton@uniontrib.com wrote:
> Though your copy may end up needing repair after the fact; backup in this
> sense is not transactional.  DirectoryStorage has the best answer for this
> at the moment (better than FileStorage),

What lead you to this belief?

cheers,

Chris