[ZODB-Dev] RelStorage and PosKey errors - is this a risky hotfix?

Shane Hathaway shane at hathawaymix.org
Thu Jan 27 05:33:39 EST 2011


On 01/24/2011 02:02 PM, Anton Stonor wrote:
> Now, I wonder why these pointers were deleted from the current_object
> table in the first place. My money is on packing -- and it might fit
> with the fact that we recently ran a pack that removed an unusual large
> amount of transactions in a single pack (100.000+ transactions).
>
> But I don't know how to investigate the root cause further. Ideas?

I have meditated on this for some time now.  I mentioned I had an idea 
about packing, but I studied the design and I don't see any way my idea 
could work.  The design is such that it seems impossible that the pack 
code could produce an inconsistency between the object_state and 
current_object tables.

I have lots of other ideas now, but I don't know which to pursue.  I 
need a lot more information.  It would be helpful if you sent me your 
database to analyze.  Some possible causes:

- Have you looked for filesystem-level corruption yet?  I asked this 
before and I am waiting for an answer.

- Although there is a pack lock, that lock unfortunately gets released 
automatically if MySQL disconnects prematurely.  Therefore, it is 
possible to force RelStorage to run multiple pack operations in 
parallel, which would have unpredictable effects.  Is there any 
possibility that you accidentally ran multiple pack operations in 
parallel?  For example, maybe you have a cron job, or you were setting 
up a cron job at the time, and you started a pack while the cron job was 
running.  (Normally, any attempt to start parallel pack operations will 
just generate an error, but if MySQL disconnects in just the right way, 
you'll get a mess.)

- Every SQL database has nasty surprises.  Oracle, for example, has a 
nice "read only" mode, but it turns out that mode works differently in 
RAC environments, leading to silent corruption.  As a result, we never 
use that feature of Oracle anymore.  Maybe MySQL has some nasty 
surprises I haven't yet discovered; maybe the MySQL-specific "delete 
using" statement doesn't work as expected.

- Applications can accidentally cause POSKeyErrors in a variety of ways. 
  For example, persistent objects cached globally can cause 
POSKeyErrors.  Maybe Plone 4 or some add-on uses ZODB incorrectly.

Shane


More information about the ZODB-Dev mailing list