[ZODB-Dev] Tracking down causes for the recent POSKeyError problems we get ...

Joachim Werner joe@iuveno-net.de
Sun, 12 Jan 2003 13:28:19 +0100


Hi!

A while ago I posted about frequent POSKeyErrors we are getting. Before 
blaming FileStorage I'd like to track down the possible causes on 
application level.

For that I'd need some hints:

Running fstest.py returns errors like this (I modified it to not stop at 
the first error it encounters):

606435173 object serialno 0x0348d1012f1eebc4 does not match transaction 
id 0x0348d101f2f4aff7

622613103 object serialno 0x0347852012dcb4a2 does not match transaction 
id 0x0348f77b1bf2f866

622613454 object serialno 0x034800c732d74de6 does not match transaction 
id 0x0348f77b1bf2f866

I haven't found the time yet to dive into ZODB internals. What I'd need 
to know is how I can get Zope (or some Python script instead) to return 
the actual object involved (e.g. formatted as an XML export). I want to 
be able to see if the errors are related to a certain meta_type, type of 
transaction (e.g. copy&paste, object creation, ...) or time.

I have not verified this yet, but I think the errors fstest.py returns 
relate to the problems we frequently get: It seems that certain objects 
get lost when the database is packed. So they are still referenced 
somewhere, but the record is removed. This then causes a POSKeyError as 
soon as Zope tries to use the object.

I'd appreciate any kind of hints that help me understand what is going on:

- How does the packing algorithm find out if an object can be removed?

- How can errors like the above (serialno doesn't match transaction id) 
occur?

I am really scared because I didn't have a single case of ZODB data 
corruption for years and now we are getting them on a weekly, sometimes 
daily base ...

Cheers

Joachim