[ZODB-Dev] Data lossage

Paul Winkler pw_lists at slinkp.com
Wed Sep 3 13:09:22 EDT 2003


I'm not entirely sure if this is really a ZODB issue - could
conceivably be some obscure bug in Zope or CMF, i guess - 
but this seems like the right place to start...

Version information:
was zope 2.6.2b3, now zope 2.6.2b6
ZEO from ZODB-3.1.2
python2.1.3
linux 2.4.7-10 SMP (some flavor of redhat)
glibc-2.2.4-32

The problem: 
I've experienced intermittent data loss over the past few weeks.
There were two instances, the first involving about 10 different
objects. Both times, the pattern was something
like:

1) content administrator changes something
2) a few days pass
3) content administrator goes back to the object and reports
that changes from #1 have vanished - object has reverted
to an earlier state (or, if it was new, the object has vanished).

Close inspection of the access log reveals no deliberate
actions that would get rid of the objects in question.

I've been running every tool I can get my hands on to see
if I have ZODB corruption. This FileStorage is rather large,
currently just over 2 GB; it has survived numerous Zope upgrades and has 
been through the wringer with fsrecover.py on a few icky occasions 
(lots of POSKeyErrors in some very old objects which may have 
been caused by some nasty Versions blunders that happened before 
I arrived on the scene).

I have yet to try the tools that Dieter Maurer sent me, that is
my next step.

Some interesting points:

* The FileStorage has not been packed at any time during the period
in question. 

* fstest.py reports no problems.

* checkbtrees.py reports no problems (even the version Tim just
posted to this list).

* fsrecover.py reports "0 bytes removed during recovery",
and produces output with the same length as the input,
but the output does *not* match the input - cmp reports a
difference (about 99.9% of the way through the file):
$ cmp Data.fs.FIXED Data.fs
Data.fs.FIXED Data.fs differ: char 2006756625, line 8959559

Is this suspicious?  What could account for the difference, when
fsrecover.py claims not to find any errors?

* A small script invoking copyTransactionsFrom produces output
of the same size as the input - BUT the output does not match
the input; instead, the output matches the output of fsrecover.py!

* I wanted to try migrate.py from ZODB3 CVS but it doesn't
seem to be compatible with ZODB3-3.1.2 ... requires an
import of StorageTypes which doesn't exist.


* fsrefs.py from ZODB3 CVS reports several issues - could this
be related to lingering Versions nastiness? I long ago deleted
all Version instances and packed the storage several times since
then, but at this point I'm suspicious of everything :-\

oid 0x12877 OFS.Image.Image
last updated: 2002-06-26 15:19:50.642602, tid=0x345B317D81339DDL
refers to invalid object:
	oid 0x12a33 missing: 'OFS.Image.Pdata'

oid 0x165de OFS.Image.Image
last updated: 2002-07-19 17:51:59.297664, tid=0x3463AAFFD00DCA2L
refers to invalid object:
	oid 0x16688 missing: 'OFS.Image.Pdata'

oid 0xd629 OFS.Image.Image
last updated: 2002-06-17 14:49:57.018078, tid=0x3458059F346F22AL
refers to invalid object:
	oid 0xd688 missing: 'OFS.Image.Pdata'

oid 0x73511 failed to load
oid 0x73512 failed to load
oid 0x73513 failed to load
oid 0x73514 failed to load
oid 0x73515 failed to load
oid 0x73516 failed to load

When I run fsrefs.py with -v, it reports POSKeyErrors on 0x73511 - 0x73516.
If I have POSKeyErrors, why didn't fscheck.py detect them?

* If I run fsrefs.py on Data.fs.FIXED (the one that was produced by
fsrecover.py), I get exactly the same output. 
So apparently these are POSKeyErrors that fsrecover.py does not 
detect or fix.

-- 

Paul Winkler
http://www.slinkp.com
Look! Up in the sky! It's LATIN TURQUOISE JESUS!
(random hero from isometric.spaceninja.com)



More information about the ZODB-Dev mailing list