Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
131452: roachtest: reduce wal-failover flake around stall durations r=sumeerbhola a=itsbilal

Previously, we ran the risk of a prolonged (longer than 30/60s) disk stall if the stall command took longer to run and return a value back to us. This change moves the timer around to reduce the likelihood we'd end up unintentionally stalling the node for more than 60s and trip up the disk stall detector.

Fixes cockroachdb#131182.

Epic: none

Release note: None

Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
craig[bot] and itsbilal committed Sep 26, 2024
2 parents c56def7 + 55a622b commit ce5747b
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions pkg/cmd/roachtest/tests/disk_stall.go
Original file line number Diff line number Diff line change
@@ -128,6 +128,7 @@ func runDiskStalledWALFailover(
continue
}
func() {
stopStall := time.After(30 * time.Second)
s.Stall(ctx, c.Node(1))
// NB: We use a background context in the defer'ed unstall command,
// otherwise on test failure our Unstall calls will be ignored. Leaving
@@ -142,7 +143,7 @@ func runDiskStalledWALFailover(
select {
case <-ctx.Done():
t.Fatalf("context done while stall induced: %s", ctx.Err())
case <-time.After(30 * time.Second):
case <-stopStall:
// Return from the anonymous function, allowing the
// defer to unstall the node.
return
@@ -156,7 +157,7 @@ func runDiskStalledWALFailover(
time.Sleep(1 * time.Second)
exit, ok := getProcessExitMonotonic(ctx, t, c, 1)
if ok && exit > 0 {
t.Fatal("process exited unexectedly")
t.Fatal("process exited unexpectedly")
}

data := mustGetMetrics(ctx, c, t, adminURL, install.SystemInterfaceName,

0 comments on commit ce5747b

Please sign in to comment.