[ZODB-Dev] zeoraid backends dropping out

Wed Nov 11 05:34:19 EST 2009

Hi All,

Since this is the first post to this list on the issue, the setup I have is:

zeoraid1 and zeoraid2 are zeoraid servers
zeo1 and zeo2 are normal storage servers serving two file storages: 
packed and unpacked.

zeoraid1 and zeo1 are on one box, zeoraid2 and zeo2 are on the other 
both. A variety of zeo clients connect to the zeoraid servers, and each 
client has the ip addresses of both zeoraid servers in its clientstorage 
config, so they round robin in the event of a zeoraid server failure.

I'm using zodb 3.9.3 and zeoraid 1.0b3 for the servers, the clients are 
all currently Zope 2.9.8.

So, diagrammatically, the configuration is roughly:

            zeoraid1 -> zeo1 zeo2
clients ->
            zeoraid2 -> zeo1 zeo2

Now, last week, the packed storage on zeo1 started showing as failed in 
the zeoraids. After that, a load of data was committed to the storage, 
which all ended up in zeo2. Recovering the 10GB or so of data took a 
couple of days.

I just went to check that the recovery was finished, only to find the 
unpacked storage on zeo2 is now showing as failed in zeoraid2 (which is 
the zeoraid currently in use by the clients) and when I try to do 
"bin/zeoraid -S packed details" against zeoraid1, I get:

(20659) CW: error in notifyConnected (('127.0.0.1', 6001))
Traceback (most recent call last):
   File "ZEO/zrpc/client.py", line 476, in notify_client
     self.client.notifyConnected(self.conn)
   File "ZEO/ClientStorage.py", line 621, in notifyConnected
     self.verify_cache(stub)
   File "ZEO/ClientStorage.py", line 1308, in verify_cache
     ltid = server.lastTransaction()
   File "ZEO/ServerStub.py", line 89, in lastTransaction
     return self.rpc.call('lastTransaction')
   File "ZEO/zrpc/connection.py", line 703, in call
     raise inst # error raised by server
RAIDClosedError: Storage has been closed.

This feels exactly what happened with the packed storage on zeo1 before. 
I've looked in the zeoraid and zeo event logs on both servers and didn't 
see anything logged as an error. The only weird (and worrying!) thing I 
did see was in the zeo2 event log:

------
2009-11-11T02:52:19 WARNING ZODB.FileStorage Packed.fs truncated, 
possibly due to damaged records at 14487840095

...but the packed storage shows as optimal in both zeoraid1 and 
zeoraid2, so what does this mean?

cheers,

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
            - http://www.simplistix.co.uk