[ZODB-Dev] ZSS Failover and client caching

sean.upton@uniontrib.com sean.upton@uniontrib.com
Fri, 01 Jun 2001 14:27:59 -0700


I've been thinking about implementing something like this; a few notes.

I haven't used fake, but hearbeat, in most cases, supercedes it.  I am doing
this with some other boxes for network services like proxies.  If the box
dies, so does the heartbeat it sends over a serial or (in my case)
UDP/IP/ethernet; mon runs on the primary machine, monitoring it's own
services, and if it sees a service down, it shuts down the ethernet
interface that is used for UDP heartbeats, which causes the backup box to
panic and take over.  I suppose that one could write a monitor for mon for a
ZEO ZSS that was a python script that loaded a ZEO Client, and attempted to
connect to the ZEO server over the appropriate TCP port/interface.  If the
monitor sees the ZSS not up, it shuts down the ethernet interface that
heartbeat is using.  The backup server then cannot see heatbeats, and uses
gratuitous ARP to take over it's role.

The problem, I think, is that the failover is not the problem - it is
reliable replication.  

As an alternative to CODA, use a direct attached storage device (DASD /
external RAID) that connects to 2 hosts (backup and primary) using SCSI.
These devices usually have loads of stuff to prevent a hardware
single-point-of-failure, like dual power, dual controller, spare disks,
RAID10, etc.  The backup has a RAID10 volume on the device mounted RO, while
the primary has it mounted RW.  If the backup takes over, it could mount
(using a mon 'alert' script running on the backup box) the volume where the
ODB is stored RW, and use a STONITH (shoot the other node in the head)
method (kill the power to the primary) to guarantee that the primary doesn't
spontaneously come back on line.

The problem with either of these approaches (CODA or DASD) is that if the
node dies, taking some data down with it, you still have problems.  If the
node dies, you might still need some manual intevention to fix your Data.fs.

Could one make some home-grown replication (for now) by using rsync over a
network (or CODA or DASD with a shared ODB), and still have a way to
auto-recover a corrupt data.fs (strip off incomplete last transactions?),
without human intervention?

Sean

-----Original Message-----
From: Michel Pelletier [mailto:michel@digicool.com]
Sent: Friday, June 01, 2001 1:22 PM
To: Dyon Balding
Cc: zodb-dev@zope.org
Subject: Re: [ZODB-Dev] ZSS Failover and client caching


On Fri, 1 Jun 2001, Dyon Balding wrote:

> Hi all,
> 
> I am in the process of setting up a cluster of machines using ZEO. 
> There will be some kind of hardware load balancer in front of the Zope
> clients, which will all be connected to the single ZSS.  I need to
> formulate a plan in the case of the ZSS server failing.
> 
> One option would be to have a backup ZSS that would assume the main
> ZSS's IP once it had been taken off the network.  A problem I see with
> this is that if the Data.fs on the backup system is out of sync with the
> main server, the clients will not realise this because they won't try to
> update their local cache unless they get a cache miss. 

You could, perhaps, try to keep your Data.fs on a replicated filesystem,
like Coda.  if one ZSS failed, you could use an IP spoofer (like
'fake') to make a different ZSS assume the failed IP (as you
describe).  The failover machine would use the same Data.fs as the failed
one because they both use the same underlying replicated Coda
filesystem.  You would not need to restart your clients because the backup
server now looks identical to the failed main server.

I've never tried this, but I can't see what is wrong with it in theory.

> Obviously
> restarting the Zope clients would fix that, but I was hoping for a more
> seamless failover.

But even if the clients restart, aren't you working off of an unsynced,
"older" image of the database?  As you pointed out, the backup system is
"out of sync" with the main server.  Restarting the clients solves the
caching inconsistency, but I think a better approach would be to solve the
ZSS sychronization.

-Michel


_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://lists.zope.org/mailman/listinfo/zodb-dev