[Checkins] [zopefoundation/ZEO] c29308: Don't triplicate connection attempts

GitHub noreply at github.com
Mon May 27 16:53:14 UTC 2013


  Branch: refs/heads/master
  Home:   https://github.com/zopefoundation/ZEO
  Commit: c293082e58dce9fbcbe65fe90b49f6166622090a
      https://github.com/zopefoundation/ZEO/commit/c293082e58dce9fbcbe65fe90b49f6166622090a
  Author: Marius Gedminas <marius at gedmin.as>
  Date:   2013-05-27 (Mon, 27 May 2013)

  Changed paths:
    M src/ZEO/tests/new_addr.test
    M src/ZEO/zrpc/client.py

  Log Message:
  -----------
  Don't triplicate connection attempts

The loop over all possible IPv4 and IPv6 addresses turns out to also
loop through all possible socket types (SOCK_STREAM/IPPROTO_TCP,
SOCK_DGRAM/IPPROTO_UDP, SOCK_RAW/IPPROTO_IP).  This meant that each
connection attempt was repeated three times, serially.

This fixes new_addr.test nondeterministic failures.  Here's a short
reminder of what that test does:

  1. Starts a ZEO server on random port X
  2. Connects and creates some data
  3. Stops the ZEO server
  4. Starts a new ZEO server on random port Y
  5. Tells the old connection about the new address
  6. Makes a modification though a new connection
  7. Waits for the old connection to reconnect using the new address,
  8. Verifies that it sees the new data

Here's why the test used to fail:

  * In step 3, when we stop the ZEO server, the client would notice a
    disconnect and immediately try to reconnect.

  * Due to this bug it would open three TCP connections to localhost:X
    and somehow succeed (I've no idea why -- ZEO server is supposed to
    close the listening socket before it drops client connections, so
    how can a new connection to the listening socket succeed?)

  * It would try handshaking each of the connection one after the other,
    timing out after 10 seconds each time (in ZEO.ServerStub.stub).

  * Only after all three connection attempts failed would it sleep for
    `max_disconnect_poll` seconds and then try to connect to localhost:Y

  * Three times 10 seconds is 30 seconds, which is by accident the same
    timeout the test uses in step 7 to wait for a successful
    reconnection.

With this fix the test still does one unnecessary 10 second timeout
before it passes.  I'd love to fix it, but I'm losing hope of
understanding what's actually happening there.





More information about the checkins mailing list