-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: failover
variant for partial network partitions
#94614
Comments
cc @cockroachdb/replication |
@andrewbaptist Note that we already have roachtest variants for asymmetric partitions, which is what's most relevant for #84289: cockroach/pkg/cmd/roachtest/tests/failover.go Lines 43 to 51 in 42e6639
The pMax recovery time is graphed nightly, and looks terrible because we just don't handle these failures at all (60 seconds is the current maximum latency the tests measure). |
This is not done yet, the test in #95394 does not cover two important classes of partition: leader/leaseholder and leaseholder/gateway. |
I'll pick this up as part of the Raft prevote work. |
We should add
failover
roachtest variants that benchmark range unavailability in the case of partial network partitions. There are two main variants, where all nodes can reach the liveness leaseholder: range leaseholder partitioned away from Raft leader, and SQL gateway partitioned away from range leaseholder.See also internal doc.
Jira issue: CRDB-23045
Epic CRDB-25212
The text was updated successfully, but these errors were encountered: