[ZODB-Dev] Data file corruption and recovery

Erik Dahl edahl@zentinel.net
Thu, 13 Feb 2003 09:33:27 -0500


Yesterday I had a cpu failure on a box that caused the sudden reboot of 
a zeo server.  When the service was brought up on the other side of the 
cluster it didn't start.  I figured this was due to data corruption and 
when using a backup the server started fine.  The problem was the backup 
was a little stale so I wanted to try recovering the corrupt file.  I 
found two methods for fixing the file running fsrecover.py or running 
tranalyzer.py then using its output to truncate the data file. 
fsrecover.py did fix my problem but only after running for around 6 
hours and generating no output other than to say that no data was lost. 
  The tranalyzer method never worked.  My questions are:

1. how can you figure out what the server is doing when you have a 
corrupted file (I tried setting STUPID_LOG_SEVERITY to -300 with no 
results).  

2. any idea why taking transactions off the end of the file didn't fix 
the problem?

3. would directory storage handle this situation better or do I need to 
go to a berkeley db backend?

-EAD