-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv/kvserver: TestReplicaRemovalClosesProposalQuota failed #96932
Comments
kv/kvserver.TestReplicaRemovalClosesProposalQuota failed with artifacts on master @ 2a7edbeb0737b1309064c25c641a309c2980d9ba:
Parameters: |
kv/kvserver.TestReplicaRemovalClosesProposalQuota failed with artifacts on master @ 2a7edbeb0737b1309064c25c641a309c2980d9ba:
Parameters: |
kv/kvserver.TestReplicaRemovalClosesProposalQuota failed with artifacts on master @ 3e26d85118ef73133c00b04b17449c27c31b8bc4:
Parameters: |
Related to the replica refactorings @tbg?
|
kv/kvserver.TestReplicaRemovalClosesProposalQuota failed with artifacts on master @ 5455e9e7e4d533d8ca5dfab496a116888a610c58:
|
kv/kvserver.TestReplicaRemovalClosesProposalQuota failed with artifacts on master @ 7e2df35a2f6bf7a859bb0539c8ca43c4e72ed260:
Parameters: |
kv/kvserver.TestReplicaRemovalClosesProposalQuota failed with artifacts on master @ c95bef097bd4c213c6b5c0c125a9a846c4479d73:
Parameters: |
Unclear but possible, I'll need to bisect. |
kv/kvserver.TestReplicaRemovalClosesProposalQuota failed with artifacts on master @ fb740d4b50b5d77b6fa2a10771058dc6cf1bb1b2:
|
// NB: We need to be sure that our Replica is the leaseholder for this
// test to make sense. It usually is.
lease, pendingLease := repl.GetLease()
if pendingLease != (roachpb.Lease{}) || lease.OwnedBy(store.StoreID()) {
skip.IgnoreLint(t, "the replica is not the leaseholder, this happens rarely under stressrace")
} 👀 That's wrong, it ought to be So to start, the test has essentially been skipped for the last three years 😅 It hangs when fixed, so the test is basically broken right now. When the test skips itself that way, we shut down the server and seem to be leaking async proposal spans some of the time. This isn't supposed to happen, we clean them up here: cockroach/pkg/kv/kvserver/store_raft.go Lines 730 to 749 in 736a67e
|
I printf'ed both the above clean-up and the place where the spans are created and it doesn't look like there's an interleaving. We create some spans, and only ~100ms later do we attempt to clear out the spans, and still we leak some. The test doesn't actually get to remove the replica (the skip comes too early), so not sure what is special about this test. It could be that it is setting a very small proposal quota. Maybe there are some ways in which proposals can exit the system with their trace span unfinished. |
With a larger proposal quota the failure goes away. So likely we are at a risk of leaking proposal spans if the server shuts down while blocking on proposal quota acquisition. I verified with some printf-logging that we are indeed seeing some proposals return in cockroach/pkg/kv/kvserver/replica_raft.go Lines 265 to 268 in 560ea7d
in the failing runs. There wouldn't be anyone finishing the spans there. That can be fixed! |
kv/kvserver.TestReplicaRemovalClosesProposalQuota failed with artifacts on master @ 8c54290f5f1ad68976e59a5f623126ae9d734bf0:
Parameters: |
kv/kvserver.TestReplicaRemovalClosesProposalQuota failed with artifacts on master @ cf14ad694ee562676f53e36fa8495206c3aed61f:
Parameters: |
Previously, if `evalAndPropose` returned with an error while trying to propose a pipelined write, it would leak the trace span. Make sure this doesn't happen. The test that exercised this got deleted in the previous commit, but this is still a bug that could cause a larger leak if a condition were ever added to `evalAndPropose` which could cause a large amount of errors (perhaps spread out over a longer time period) in production clusters. Unfortunately, writing a test for this is likely to be a net negative; if the reviewer feels strongly I can add a testing knob to inject an error in the right place in this method and exercise it that way. Touches cockroachdb#96932. Epic: none Release note: None
Previously, if `evalAndPropose` returned with an error while trying to propose a pipelined write, it would leak the trace span. Make sure this doesn't happen. The test that exercised this got deleted in the previous commit, but this is still a bug that could cause a larger leak if a condition were ever added to `evalAndPropose` which could cause a large amount of errors (perhaps spread out over a longer time period) in production clusters. Unfortunately, writing a test for this is likely to be a net negative; if the reviewer feels strongly I can add a testing knob to inject an error in the right place in this method and exercise it that way. Touches cockroachdb#96932. Epic: none Release note: None
98685: kvserver: remove TestReplicaRemovalClosesProposalQuota r=pavelkalinnikov a=tbg As this test explained in its comment, the circumstances it was seeking to test were very rarely met and even if the behavior were missing, this wouldn't cause any problems. The test has been skipped (accidentally) for a long time, was flaky even then (leaking trace spans, see below), and it's not easy to enact the situation the test wishes to construct in an idiomatic way. It's pretty clear that the lines the test wants are present: https://github.com/cockroachdb/cockroach/blob/736a67e0d36cc545bf74d65db069ee895ff9bea0/pkg/kv/kvserver/replica_destroy.go#L174-L176 It's better to remove the test at this point. Closes #96932. Also, fix a buglet that the test "accidentally" uncovered: Previously, if `evalAndPropose` returned with an error while trying to propose a pipelined write, it would leak the trace span. Make sure this doesn't happen. The test that exercised this got deleted in the previous commit, but this is still a bug that could cause a larger leak if a condition were ever added to `evalAndPropose` which could cause a large amount of errors (perhaps spread out over a longer time period) in production clusters. Unfortunately, writing a test for this is likely to be a net negative; if the reviewer feels strongly I can add a testing knob to inject an error in the right place in this method and exercise it that way. Touches #96932. Epic: none Release note: None Co-authored-by: Tobias Grieger <[email protected]>
kv/kvserver.TestReplicaRemovalClosesProposalQuota failed with artifacts on master @ 9ea0ef2f6a8e31e22c7350de5cacd9c5992d07a0:
Parameters:
TAGS=bazel,gss
Help
See also: How To Investigate a Go Test Failure (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-24415
The text was updated successfully, but these errors were encountered: