[ZODB-Dev] RE: "Time travel" conflict errors in Zope

Mon Dec 20 10:57:56 EST 2004

On Mon, 20 Dec 2004 10:37:58 -0500, Tim Peters wrote:
> [Malcolm Cleaton]
>> One of our servers, after the disk filled up, started showing some "time
>> travel" conflict errors.
>>
>> When trying to commit a change, the error shown would say that a conflict
>> occurred, and would show two dates, which appeared to be the date the
>> object was loaded and the (later) date some other thread had changed it.
> 
> A ConflictError message doesn't say when the object was loaded.  WRT your
> later message, "serial this txn started with" is the most recent time in the
> past at which the object in question was committed, relative to the time the
> failing transaction began.  For example, if I last changed a particular
> object in 1976, and started a transaction today that tried to modify that
> object, but got a conflict error, it's expected that the message would say
> the serial I started with was from 1976.  If it's a long time in the past,
> that just means the object was last modified a long time in the past.

I'm with you. This is good to know.

>> The strange issue was, the date the error claimed the object had been
>> loaded was several hours in the past,
> 
> Now that you know it never tries to say anything about when the object was
> loaded, does that still seem strange to you?
> 
>> and remained there even after restarting the server.
> 
> If no change to the object was successfully committed, then the serial of
> the most recently committed change would not change.

Yes, but this is in the context of repeated conflict errors on repeated
attempts to change the same object, each of which began after the
timestamp given for the later conflicting commit.

>> We were able to get rid of the errors by truncating the Data.fs. They
>> later returned immediately after a cache rotation, but we deleted the
>> cache files.
> 
> With what effect?  I'm really not clear on what "the errors" are here,
> but whatever they were, did deleting the cache files make (or appear to
> make) them "go away"?

Yes. The errors went away, and we were once more able to edit the objects
in question.

> Have you run fstest and fsrefs?  ZODB/ZEO aren't normally tested under
> "oops!  we ran out of disk space" conditions, and it's plausible
> something got wedged when that happened to you.  More info about fstest
> and fsrefs here:
> 
>     http://zope.org/Wikis/ZODB/FileStorageBackup

Yes, I ran fstest and fsrefs, but not on the original, non-truncated
Data.fs, only on the latest one after we saw the errors again after a Zeo
client cache file rotation. There were no problems with the Data.fs at
that time.

My current theory, in the light of your explanations, is that the
disk-full event caused the zeo client cache to miss some changes, and that
everything else we saw is the direct result of this.

I would guess that, for efficiency reasons, the cache validation
procedure doesn't think too hard about the possibility of object changes
in the past that have already been missed, and that therefore the only way
out of such situations is to delete the cache files.

I'd guess further that truncating the ZODB was entirely the wrong thing to
do, and that deleting the cache files alone would have been sufficient,
but it does now make sense why rolling the ZODB back to before the cache
went wrong had some effect.

Thanks,
Malcolm.

-- 

    [] j a m k i t
      web solutions for charities

         malcolm cleaton
T:  020 7549 0520
F:  020 7490 1152
M:  07986 563852
W: www.jamkit.com