[ZODB-Dev] Recovering from BTree corruption

Jim Fulton jim at zope.com
Thu Sep 27 11:47:07 EDT 2007


On Sep 12, 2007, at 10:28 AM, Jim Fulton wrote:
...
>>>>   - checkbtrees.py
>>>>   - fstest.py
>>>
>>> There's an fsrefs script that checks internal references I believe.
>>
>> fsrefs.py shows loads of problems in both the data.fs and the  
>> resources.fs.
>> probably > 200 entries per database. i.e.
>>
>> oid 0xD87110L BTrees._OOBTree.OOBucket
>> last updated: 2007-09-04 14:43:37.687332, tid=0x37020D3A0CC9DCCL
>> refers to invalid objects:
>>         oid ('\x00\x00\x00\x00\x00\xb0+f', None) missing: '<unknown>'
>>         oid ('\x00\x00\x00\x00\x00\xb0N\xbc', None) missing:  
>> '<unknown>'
>>         oid ('\x00\x00\x00\x00\x00\xb0N\xbd', None) missing:  
>> '<unknown>'
>>         oid ('\x00\x00\x00\x00\x00\xd7\xb1\xa0', None) missing:  
>> '<unknown>'
>>         oid ('\x00\x00\x00\x00\x00\xc5\xe8:', None) missing:  
>> '<unknown>'
>>         oid ('\x00\x00\x00\x00\x00\xc3\xc6l', None) missing:  
>> '<unknown>'
>>         oid ('\x00\x00\x00\x00\x00\xc3\xc6m', None) missing:  
>> '<unknown>'
>>         oid ('\x00\x00\x00\x00\x00\xcahC', None) missing: '<unknown>'
>>         oid ('\x00\x00\x00\x00\x00\xaf\x07\xc1', None) missing:  
>> '<unknown>'
...

>>   - How do I tell if something is a reference to another database?
>
> I don't know how to do this with fsrefs.  I'm not 100% sure that  
> fsrefs recognizes cross-database references.

I did a little looking at fsrefs.  It doesn't analyze the types of  
references. It just tries to load objects.  This approach, aside from  
being less informative than it should be, totally fails with multiple  
databases. Cross-database references will always be reported as  
"missing" by fsrefs.

....

> I'll try to make some time in the next few days to look at this issue.

Man it's hard to make time ...

>
> I'll look at fsrefs a bit more closely to:
>
>   - make sure it understands cross-database references, and

It doesn't.

>   - Make sure it reports whether missing references are local or  
> remote.

Haha ;)

> I'd like to decide what to do next based on this investigation.  In  
> particular, I want to be sure if the problems you are having are  
> actually due to cross-database reference issues.
>
> I'll also look at writing a tool that might be able to recover lost  
> objects from backup databases.  The idea is that a tool would scan  
> a database for missing oids save the list to files, separating  
> references to different databases.  Then there'd be another tool  
> that would read this list and a list of old database files and scan  
> the files looking for oids in the list and extracting records if  
> they are found.

I spent some time on an analyses tool. See:

   http://svn.zope.org/zc.fsutil/branches/dev/

and especially:

   http://svn.zope.org/zc.fsutil/branches/dev/src/zc/fsutil/ 
references.txt?view=auto

It will help you figure out if you have holes and separate cross- 
database and local references.  You may have to work a little though.  
The data structures produced will allow you to analyze broken cross- 
database references in a way that should be fairly obvious. (Hint,  
you'll have to generate data for each database and make sure that all  
of oids mentioned in the set of cross-database references are  
actually present in the named databases.)

A major challenge is handling large databases.  We have databases  
will millions of objects and I kept having to trim the amount of data  
analyzed to fit the data structures in memory.  It is interesting to  
look at the evolution of the data structures over the last couple of  
days yesterday as I tried to cope with scale.

The obvious next step is to store data in a database rather than  
memory.  This will slow things down, but will allow me to work with  
arbitrarily large databases and keep richer data structures.

Assuming that you still care about this (you've been quiet :), I  
suggest using this tool to find the holes. (You can also use it to  
find the objects that refer to the missing objects.)

Then, once you've found the missing oids, you should go to backups,  
open file storages on the backups and, if the oids are present, copy  
the pickles to the database under repair.  Something like:

   pickles = [backup_storage.load(oid, '')[0] for oid in oids]
   t = transaction.begin()
   s = database_with_hole
   s.tpc_begin(t)
   [s.store(oid, '\0'*8, p, '', t) for (oid, p) in zip(oids, pickles)]
   s.tpc_vote(t)
   s.tpc_finish(t)

If you don't have the data in backups, then you might be able to use  
information about the objects referring to the missing objects to  
repair the refering objects by hand by deleting the references to  
missing objects.

Hope this helps.

Jim

--
Jim Fulton
Zope Corporation




More information about the ZODB-Dev mailing list