[ZODB-Dev] RelStorage and PosKey errors - is this a risky hotfix?

Anton Stonor anton at headnet.dk
Thu Jan 27 06:01:14 EST 2011


Hi Shane,

Thanks for pursuing this.

I have lots of other ideas now, but I don't know which to pursue.  I need a
> lot more information.  It would be helpful if you sent me your database to
> analyze.  Some possible causes:
>
> - Have you looked for filesystem-level corruption yet?  I asked this before
> and I am waiting for an answer.
>
>
Yep, I've checked for file system consistency and Mysql consistency without
any error reported.



> - Although there is a pack lock, that lock unfortunately gets released
> automatically if MySQL disconnects prematurely.  Therefore, it is possible
> to force RelStorage to run multiple pack operations in parallel, which would
> have unpredictable effects.  Is there any possibility that you accidentally
> ran multiple pack operations in parallel?  For example, maybe you have a
> cron job, or you were setting up a cron job at the time, and you started a
> pack while the cron job was running.  (Normally, any attempt to start
> parallel pack operations will just generate an error, but if MySQL
> disconnects in just the right way, you'll get a mess.)
>
>
That's not unlikely! I've actually seen traces of packing invoked TTW,
however the cron job uses zodbpack. I will try to figure out if the PosKeys
starts to surface right after that.


> - Every SQL database has nasty surprises.  Oracle, for example, has a nice
> "read only" mode, but it turns out that mode works differently in RAC
> environments, leading to silent corruption.  As a result, we never use that
> feature of Oracle anymore.  Maybe MySQL has some nasty surprises I haven't
> yet discovered; maybe the MySQL-specific "delete using" statement doesn't
> work as expected.
>

That could also be the case. In fact we have also seen Mysql locking up
longer than expected, but that's another story.



> - Applications can accidentally cause POSKeyErrors in a variety of ways.
>  For example, persistent objects cached globally can cause POSKeyErrors.
>  Maybe Plone 4 or some add-on uses ZODB incorrectly.
>

I was not aware of that.

Next step here would probably be to inspect log files further and  grab a
copy of the dabase before PosKeys started to appear and see if it is
possible to recreate the incident.

Again, thanks.

Anton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.zope.org/pipermail/zodb-dev/attachments/20110127/373ccf90/attachment.html 


More information about the ZODB-Dev mailing list