Skip to content

Commit

Permalink
storage: deflake TestRaftRemoveRace
Browse files Browse the repository at this point in the history
Fixes cockroachdb#16376.

Under stress TestRaftRemoveRace failed with:

  --- FAIL: TestRaftRemoveRace (1.45s)
  client_test.go:1031: change replicas of r1 failed: quota pool no
  longer in use

Consider the following:
- 'add replica' commands get queued up on the replicate queue
- leader replica steps down as leader due to timeouts thus closing the
  quota pool
- commands get out of the queue, cannot acquire quota because the quota
  pool is closed and fail with an error indicating so

TestRaftRemoveRace fails as it expects all replica additions to go
through without failure.

To reproduce this the following minimal test (run under stress) is
sufficient where we lower RaftTickInterval and RaftElectionTimeoutTicks
to make it more likely that leadership changes take place.

  func TestReplicateQueueQuota(t *testing.T) {
    defer leaktest.AfterTest(t)()
      sc := storage.TestStoreConfig(nil)
      sc.RaftElectionTimeoutTicks = 2             // Default: 15
      sc.RaftTickInterval = 10 * time.Millisecond // Default: 200ms
      mtc := &multiTestContext{storeConfig: &sc}
      defer mtc.Stop()
      mtc.Start(t, 3)

      const rangeID = roachpb.RangeID(1)
      mtc.replicateRange(rangeID, 1, 2)

      for i := 0; i < 10; i++ {
          mtc.unreplicateRange(rangeID, 2)
          mtc.replicateRange(rangeID, 2)
      }
  }

The earlier version of TestRaftRemoveRace was written to reproduce the
failure seen in cockroachdb#9037, this was agnostic to raft leadership changes.
  • Loading branch information
irfansharif committed Jun 7, 2017
1 parent d14e7b0 commit eace0dd
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion pkg/storage/client_raft_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2115,7 +2115,11 @@ func TestRaftAfterRemoveRange(t *testing.T) {
// reproduce a race (see #1911 and #9037).
func TestRaftRemoveRace(t *testing.T) {
defer leaktest.AfterTest(t)()
mtc := &multiTestContext{}
sc := storage.TestStoreConfig(nil)
// Suppress timeout-based elections to avoid leadership changes in ways
// this test doesn't expect.
sc.RaftElectionTimeoutTicks = 100000
mtc := &multiTestContext{storeConfig: &sc}
defer mtc.Stop()
mtc.Start(t, 10)

Expand Down

0 comments on commit eace0dd

Please sign in to comment.