[Zope-dev] Coroner's toolkit for zope, or how to figure out what went wrong.

Romain Slootmaekers romain@zzict.com
Mon, 12 Aug 2002 20:10:56 +0200


Jim Fulton wrote:
> Romain Slootmaekers wrote:
> 
>> Yo,
>>
>> we had a nasty crash of our zope server that we use for a b2b web 
>> application. The Data.fs ZODB lost a significant amount of data.
> 
> 
> What sort of crash? Was this a hardware failure, or a software failure?

software.
basically, the server didn't crash, but our applications couldn't 
function anymore because some objects that really have to exist
were gone.

the Data.fs was NOT corrupted,
  but (so far I can tell) a bug in the conflict resolution code caused 
our object (the one upon we set self._p_changed=1)  to be empty. This 
object is a container of other objects that are Persistent themselves 
and at this point, we don't believe the conflict resolution mechanism 
handles these cases correctly.


> 
>> At this point, we restored the Data.fs from our last backup and the 
>> server is back up and running. (breathing relieved)
>>
>> What worries me is that we have no clue whatsoever on what happened,
>> besides the constatation that somehow, somewhere we lost a whole tree 
>> of objects.
> 
> 
> Was this in the backup? Or in the damaged data file?

nope. the loss of data occured in the 12 hours after our last backup.
so we only (well, it actually is quite a lot :( ) lost the transactions 
that happened between the backup and the restore/restart.


The stack trace in the follow up mail gives some clue on where the 
problem is situated in the code. (as well as the exact version of the 
Zope installation)

Anyway, Murphy's law is once again proven as this thing happened on the 
first day of my vacation. :|

Sloot.