Skip to content

Commit

Permalink
server: fix flaky drain test under race
Browse files Browse the repository at this point in the history
While previously `TestDrain` would issue a drain request twice, and
expect that after the second drain request there would be no remaining
leases, we have seen in some race builds that a lease extension can
occur before that second drain, leaving one lease remaining after the
second drain request. This can be seen in the following log example:
```
I230325 00:39:18.151604 14728 1@server/drain.go:145 ⋮ [T1,n1] 383  drain request received with doDrain = true, shutdown = false
...
I230325 00:39:18.155547 986 kv/kvserver/replica_proposal.go:272 ⋮ [T1,n1,s1,r51/1:‹/Table/5{0-1}›,raft] 385  new range lease repl=(n1,s1):1 seq=1 start=0,0 exp=1679704764.152223164,0 pro=1679704758.152223164,0 following repl=(n1,s1):1 seq=1 start=0,0 exp=1679704746.135729956,0 pro=1679704740.135729956,0
I230325 00:39:18.172450 14728 1@server/drain.go:399 ⋮ [T1,n1] 386  (DEBUG) initiating kvserver node drain
I230325 00:39:18.172613 14728 1@kv/kvserver/store.go:1559 ⋮ [T1,drain,n1,s1] 387  (DEBUG) store marked as draining
I230325 00:39:18.182123 14728 1@server/drain.go:293 ⋮ [T1,n1] 388  drain remaining: 1
I230325 00:39:18.182249 14728 1@server/drain.go:295 ⋮ [T1,n1] 389  drain details: range lease iterations: 1
I230325 00:39:18.182404 14728 1@server/drain.go:175 ⋮ [T1,n1] 390  drain request completed without server shutdown
```
This change modifies the test to repeatedly issue drain requests until
there is no remaining work, allowing the drain to complete upon
subsequent requests.

Fixes: cockroachdb#86974

Release note: None
  • Loading branch information
AlexTalks committed Mar 25, 2023
1 parent da20f04 commit 9b2229b
Showing 1 changed file with 15 additions and 8 deletions.
23 changes: 15 additions & 8 deletions pkg/server/drain_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ import (
"github.com/cockroachdb/cockroach/pkg/sql/appstatspb"
"github.com/cockroachdb/cockroach/pkg/testutils"
"github.com/cockroachdb/cockroach/pkg/testutils/serverutils"
"github.com/cockroachdb/cockroach/pkg/testutils/skip"
"github.com/cockroachdb/cockroach/pkg/testutils/sqlutils"
"github.com/cockroachdb/cockroach/pkg/testutils/testcluster"
"github.com/cockroachdb/cockroach/pkg/util/grpcutil"
Expand All @@ -38,7 +37,6 @@ import (
// TestDrain tests the Drain RPC.
func TestDrain(t *testing.T) {
defer leaktest.AfterTest(t)()
skip.UnderRaceWithIssue(t, 86974, "flaky test")
defer log.Scope(t).Close(t)
doTestDrain(t)
}
Expand All @@ -64,18 +62,27 @@ func doTestDrain(tt *testing.T) {

// Issue another probe. This checks that the server is still running
// (i.e. Shutdown: false was effective), the draining status is
// still properly reported, and the server slept once.
// still properly reported, and that the server only slept once (which only
// should occur on the first drain).
resp = t.sendProbe()
t.assertDraining(resp, true)
// probe-only has no remaining.
t.assertRemaining(resp, false)
t.assertEqual(1, drainSleepCallCount)

// Issue another drain. Verify that the remaining is zero (i.e. complete)
// and that the server did not sleep again.
resp = t.sendDrainNoShutdown()
t.assertDraining(resp, true)
t.assertRemaining(resp, false)
// Repeat drain commands until we verify that there are zero remaining leases
// (i.e. complete). Also validate that the server did not sleep again.
testutils.SucceedsSoon(t, func() error {
resp = t.sendDrainNoShutdown()
if !resp.IsDraining {
return errors.Newf("expected draining")
}
if resp.DrainRemainingIndicator > 0 {
return errors.Newf("still %d remaining, desc: %s", resp.DrainRemainingIndicator,
resp.DrainRemainingDescription)
}
return nil
})
t.assertEqual(1, drainSleepCallCount)

// Now issue a drain request without drain but with shutdown.
Expand Down

0 comments on commit 9b2229b

Please sign in to comment.