Skip to content

Commit

Permalink
Merge #105190
Browse files Browse the repository at this point in the history
105190: roachtest: let `failover` clusters recover r=erikgrinaker a=erikgrinaker

Previously, `failover` tests would begin teardown as soon as the last node was recovered. However, it didn't actually give the node time to recover. This could cause problems with post-test assertions, e.g. if replica circuit breakers were still tripped. We'd also like to get proper data for the last failure.

This patch adds a 1 minute wait after recovering the final node, allowing the cluster to recover.

Resolves #104694.
Resolves #105099.

Epic: none
Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
  • Loading branch information
craig[bot] and erikgrinaker committed Jun 21, 2023
2 parents 1eb628e + f30c4f4 commit 776024b
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions pkg/cmd/roachtest/tests/failover.go
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,8 @@ func runFailoverChaos(ctx context.Context, t test.Test, c cluster.Cluster, readO
failer.Recover(ctx, node)
}
}

sleepFor(ctx, t, time.Minute) // let cluster recover
return nil
})
m.Wait()
Expand Down Expand Up @@ -464,6 +466,8 @@ func runFailoverPartialLeaseGateway(ctx context.Context, t test.Test, c cluster.
}
}
}

sleepFor(ctx, t, time.Minute) // let cluster recover
return nil
})
m.Wait()
Expand Down Expand Up @@ -597,6 +601,8 @@ func runFailoverPartialLeaseLeader(ctx context.Context, t test.Test, c cluster.C
failer.Recover(ctx, node)
}
}

sleepFor(ctx, t, time.Minute) // let cluster recover
return nil
})
m.Wait()
Expand Down Expand Up @@ -711,6 +717,8 @@ func runFailoverPartialLeaseLiveness(ctx context.Context, t test.Test, c cluster
failer.Recover(ctx, node)
}
}

sleepFor(ctx, t, time.Minute) // let cluster recover
return nil
})
m.Wait()
Expand Down Expand Up @@ -931,6 +939,8 @@ func runFailoverLiveness(
failer.Recover(ctx, 4)
relocateLeases(t, ctx, conn, `range_id = 2`, 4)
}

sleepFor(ctx, t, time.Minute) // let cluster recover
return nil
})
m.Wait()
Expand Down Expand Up @@ -1046,6 +1056,8 @@ func runFailoverSystemNonLiveness(
failer.Recover(ctx, node)
}
}

sleepFor(ctx, t, time.Minute) // let cluster recover
return nil
})
m.Wait()
Expand Down

0 comments on commit 776024b

Please sign in to comment.