[ZODB-Dev] zeo server hot failover

Thu Jun 28 18:21:29 UTC 2012

I recently came across a bug report regarding configuring zeo clients
storage to list multiple zeo servers here
https://bugs.launchpad.net/zope2/+bug/143843 . I had not realized that was
possible so I tried it by creating a second zeo server instance with a copy
of the Data.fs from the first instance.  I then  added a second storage
server like so:
...
<zeoclient>
    server localhost:9997
    server localhost:9998
...

To my amazement, the client initially connected to localhost:9997 and when
I shutdown that server, the client almost instantly connected to
localhost:9998.  I could continue switching them off and on and the client
switches back and forth.  I immediately realized that hot failover might be
alot easier than I expected.  However, with more testing I run into an
issue in zodb.ZEO.ClientStorage.ClientStorage.verify_cache if a there are
transactions recorded in the client cache that were not synced up in the
 secondary zeo server:

elif server_tid < cache_tid:
                message = ("%s Client has seen newer transactions than
server!"
                           % self.__name__)
                logger.critical(message)
                raise ClientStorageError(message)

would it be so bad to do something like the following?:
elif server_tid < cache_tid:
                message = ("%s Client has seen newer transactions than
server!"
                           % self.__name__)
                logger.critical(message)
                self._cache.clear()
                raise ClientStorageError(message)

So an error is raised and logged, but with the cache being cleared so that
on the second try it reconnects?  My rational for this change is that If
your doing a hot failover that means that a) something bad has happened to
main server and the recovery of those transactions probably won't happen
any way or b) it happened during a maintenance window/the failover is
happening for convenience and any difference between the two servers are
probably minor, such as session data.  With b. the server admin would be in
the position to restore the main server anyways.

Here is where the change took place
http://svn.zope.org/ZODB/trunk/src/ZEO/ClientStorage.py?rev=93195&view=rev and
I noticed that the rational was to handle 'an odd edge case' and in the
tests the comment is that 'It is bad if a client has newer data than the
server'.  If the edge case makes this proposed change a bad idea, would it
be reasonable to have the self._cache.clear call as a optional,
configurable feature?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.zope.org/pipermail/zodb-dev/attachments/20120628/eb1177f5/attachment.html>