[ZODB-Dev] RE: ZEO server hang

Jeremy Hylton jeremy at alum.mit.edu
Thu Aug 28 19:58:29 EDT 2003


On Thu, 2003-08-28 at 13:21, Ehle, Brandon wrote:
> From the lack of response, does that mean that nobody knows why the
> ZEO server is hanging or am on the wrong mailing list?

Neither.  I'm sorry I didn't respond to your first message.  I've seen
that problem recently in a couple of other cases, and I'm trying to
track down the problem.  The thing is -- I'm tracking several different
bugs -- and I haven't made a lot of progress on this particular one.

> From looking into the code, it looks like the server is trying to
> start another transaction, but a different transaction is still
> holding a lock.  This could possibly be true if self.locked is cleared
> without self.timeout.end(self) being called.  The only way that I can
> see that this will happen is a race condition between clearing
> self.locked, but before self.timeout.end(self). As you can see, the
> message about the Transaction Lock being released happens right
> immediately after the assertion saying that the TimeoutThread hasn’t
> been unlocked yet.  If this is the case, I think if we change the code
> for _clear_transaction() such that the self.locked = 0 happens after
> the self.timeout.end(self) call it should fix this case.

I've been working on ZODB 3.1 lately, which doesn't have the locked
flag.  What's strange is that asyncore is supposed to effectively
serialize individual method calls.  That is, I don't see how the race
actually occurs.

Have you tried your solution?  Working code would encourage us to find
an explanation for why it works :-).

Jeremy





More information about the ZODB-Dev mailing list