[ZODB-Dev] ZSS Failover and client caching

Fri, 1 Jun 2001 13:21:51 -0700 (PDT)

On Fri, 1 Jun 2001, Dyon Balding wrote:

> Hi all,
> 
> I am in the process of setting up a cluster of machines using ZEO. 
> There will be some kind of hardware load balancer in front of the Zope
> clients, which will all be connected to the single ZSS.  I need to
> formulate a plan in the case of the ZSS server failing.
> 
> One option would be to have a backup ZSS that would assume the main
> ZSS's IP once it had been taken off the network.  A problem I see with
> this is that if the Data.fs on the backup system is out of sync with the
> main server, the clients will not realise this because they won't try to
> update their local cache unless they get a cache miss. 

You could, perhaps, try to keep your Data.fs on a replicated filesystem,
like Coda.  if one ZSS failed, you could use an IP spoofer (like
'fake') to make a different ZSS assume the failed IP (as you
describe).  The failover machine would use the same Data.fs as the failed
one because they both use the same underlying replicated Coda
filesystem.  You would not need to restart your clients because the backup
server now looks identical to the failed main server.

I've never tried this, but I can't see what is wrong with it in theory.

> Obviously
> restarting the Zope clients would fix that, but I was hoping for a more
> seamless failover.

But even if the clients restart, aren't you working off of an unsynced,
"older" image of the database?  As you pointed out, the backup system is
"out of sync" with the main server.  Restarting the clients solves the
caching inconsistency, but I think a better approach would be to solve the
ZSS sychronization.

-Michel