[ZODB-Dev] StorageServer's waiting list

Mon Nov 9 12:11:06 EST 2009

Hi,

On 11/09/2009 05:01 PM, Jim Fulton wrote:
> On Mon, Nov 9, 2009 at 9:25 AM, Christian Theune <ct at gocept.com> wrote:
> ...
>> Reading the code talking to tpc_transaction I found that this seems to
>> be merely an optimization (which I can disable by just letting
>> tpc_transaction return None all the time).
> 
> No, it is used to decide if the underlying storage is committing.

Right. It's the non-blocking version of doing tpc_begin then.

>> Why is the waiting list necessary?
> 
> To avoid blocking the server waiting for an underlying storage's commit lock.

Hmm. Ah - blocking the server would result in load calls from other
clients to be blocked although they could be served at that point in time.

>> And why does it work alright in a ZEO
>> fan-out scenario?
> 
> Why wouldn't it?

I'll try to explain what I see:

Assume three ZEO servers ZEO, ZEO1, and ZEO2. ZEO1 and ZEO2 are clients
for ZEO. Also, assume three Zope servers Z1a, Z1b and Z2. Z1a/b talk to
ZEO1 and Z2 talks to ZEO2.

The interaction that I see is this:

- Z1a calls ClientStorage.tpc_begin() which locally causes
  tpc_transaction() to start returning a non-None value
  blocking other tpc_begin calls from this Zope server from now on. The
  StorageServer also has a safe-guard against this.

- ClientStorage then causes ZEOStorage.tpc_begin() on ZEO1 to be called
  which prepares the ZEOStorage to prepare the commit log. Nothing is
  seen on the storage behind yet.

- Z1a calls ClientStorage.store() and pushes data into the commit log.
  Those initial steps can happen from multiple ZEO clients in parallel,
  but only once per client.

- Z1a calls ClientStorage.vote() which causes ZEO1's ZEOStorage.vote()
  to be called which in turn calls _wait() which again calls _restart()
  finally causing the underlying storage's tpc_begin() to be called and
  replaying the commit log of ZEO1 into the upstream ZEO until the
  commit log is done and calls the upstream vote() which causes
  tpc_begin() on the final storage to  be called.

At this point, ZEO2 doesn't know about the ongoing transaction in the
upstream ZEO, but ZEO1 does.

Z1a will not be able to issue another commit, those are blocked locally
by ClientStorage but pure reading transactions will go through.

Z1b will be fine because ZEO1 knows about the ongoing commit and puts
Z1b into the waiting list when trying to vote, allowing other reads from
that connection to go through.

However, when Z2 tries to commit, it starts filling the commit log on
ZEO2. ZEO2 doesn't know about the ongoing commit on the upstream ZEO and
will allow the vote phase to go upstream. However, know the commit from
Z2 gets stuck because it is put in the waiting list on the upstream ZEO
while ZEO2 thinks it was able to proceed.

This will cause Z2 to completely become stuck and not benefit from the
waiting list on ZEO2.

Sorry for the bloated example, I think it's the smallest way to explain it.

Am I misunderstanding how the waiting list works?

Christian

-- 
Christian Theune · ct at gocept.com
gocept gmbh & co. kg · forsterstraße 29 · 06112 halle (saale) · germany
http://gocept.com · tel +49 345 1229889 0 · fax +49 345 1229889 1
Zope and Plone consulting and development