[ZODB-Dev] Preliminary notes from fixing a bad data.fs

ethan fremen lists at mindlace.net
Fri Jan 7 22:51:24 EST 2005


Tim Peters wrote:
> [ethan fremen]
> 
>>OK, so I think I've finally gone through and un-wedged this Data.fs.
> 
> Congratulations!  It would be OK by me if you advertised here that you're
> willing to perform this service for pay <0.5 wink>.

Heh! I just might, though hopefully the code I will make available will 
make me less likely to get more gigs ;)

> I don't know why you'd assume this is deliberate behavior.

I'm sorry, I wasn't assuming it was deliberate; what I think I was 
trying to say is, if an object isn't unpickleable, what can really be 
recovered from it? I mean, for example, this one had a header horked 
enough to not have an OID... Then, it was mostly a windows word doc, but 
it was amputated part way through. I guess I can see the value in 
extremis of getting something out of that, but my suspicion is that most 
of the time you'd rather be able to pack the Data.fs.

> Offhand, I don't know whether that's reasonable.  If the effect of your
> change is to suppress the exception and keep going, then it sure sounds like
> that opens a door to silent data loss. 

Well, what's happening at that point, as far as I can tell, is that 
referencesf is looking through the object for more object references. If 
there are none, it returns []. I think this has to do with making sure 
back-references to objects in prior transactions are not flushed.

In my (diff attached) version, if the object isn't unpickleable, I just 
return [] - the logic being, if I can't unpickle this object, does it 
matter what it refers to?

Anyway, I don't want to open the door to silent data loss, but I would 
like a "I went ahead and packed this, but there was this one object I 
couldn't pack" kind of arrangement, or at least a "pack it now, I mean 
it" option.  Anything that keeps someone out of the "the database is 
growing / it really is not showing / any sign of slowing" willie wonka 
scenario.

>>     serials=(oserial, serial))
>>ConflictError: database conflict error (oid 0x05, serial this txn started
>>with 0x00 1900-01-01 00:00:00.000000, serial currently committed
>>0x0339fe733cf5ddf7 2001-01-20 21:39:14.287598)
> 
> 
> Woo hoo!  Congratulations again -- for the past couple days we've been
> fruitlessly speculating here about how we could possibly get a conflict
> error that claimed the "before" serial was 0.  There seem to be a couple
> reports of that per year, but doesn't look like anyone ever got anywhere
> with it.

I was a little strung out when I wrote this, but the key - I think - is 
that I told fsrecover.py to pack the db when it was done. I will try to 
reproduce this, but I don't know whether it's reproduceable on anything 
other than the blinkered Data.fs I have.

I'm doing the surgery on the production Data.fs here in a few days; once 
  the patient is off the table and in recovery I'll put together what 
little I've figured out.

~ethan fremen
-------------- next part --------------
604c605,608
<             return referencesf(self._file.read(dh.plen))
---
>             try:
>                 return referencesf(self._file.read(dh.plen))
>             except ValueError:
>                 return []


More information about the ZODB-Dev mailing list