-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv/kvserver: TestDecommission failed #96630
Comments
@AlexTalks wanna have a look? |
We have marked this test failure issue as stale because it has been |
|
Refs: cockroachdb#96630 Reason: flaky test Generated by bin/skip-test. Release justification: non-production code changes Release note: None Epic: None
Refs: cockroachdb#96630 Reason: flaky test Generated by bin/skip-test. Release justification: non-production code changes Release note: None Epic: None
kv/kvserver.TestDecommission failed with artifacts on master @ 8833248e0cb5d93d3ea30936e86702802b91dc7c:
|
kv/kvserver.TestDecommission failed with artifacts on master @ 8833248e0cb5d93d3ea30936e86702802b91dc7c:
|
kv/kvserver.TestDecommission failed with artifacts on master @ 8833248e0cb5d93d3ea30936e86702802b91dc7c:
|
Test is flaking due to a timing issue, I'll remove GA blocker and send a fix later today. |
On further inspection this is due to #100783 - when in a metamorphic build with mux range feeds enabled, the span configs never get updated. |
Similar to the issue found in cockroachdb#99207, it was discovered that `TestDecommission` is also failing due to cockroachdb#100783 - under metamorphic builds, mux range feeds (among other settings) are enabled, which cause span config updates (which depend on range feeds) to fail. As can be seen in the test, this causes expectations on things such as the replication factor applied to a range to fail due to being out-of-date. Until cockroachdb#100783 is fixed, this changes `TestDecommission` to be skipped under metamorphic builds; stressing this test has shown that it succeeds repeatedly when in a non-metamorphic build. Fixes: cockroachdb#96630 Release note: None
cc @miretskiy |
Fix error handling and retries when restarting rangefeeds. A big difference between regular rangefeed, and mux rangefeed, is regular rangefeed has a dedicated go routine per range. This go routine is responsible for running a rangefeed, handling its errors, management of state pertaining to the RPC call (transport, retry information, routing information, etc), and restarting rangefeed with backoff as needed. Mux rangefeed, on the other hand, is not "loop" based. Prior to this PR, mux rangefeed, when it encountered a transient error, would loose a lot of the restart state mentioned above. For example, it would loose the transport information, so that the restart would run against the same node as before, resulting, potentially, in busy loops. Those busy loops (where the RPC call is restarted against the same node/replica that just experienced an error), would tend to make test flaky since they would take longer time to converge to the state expected by the tests (such as `TestDecommission`) test. This PR fixes this loss of state pertaining to single range restart by associating this state with the long lived `activeMuxRangeFeed` state. Fixes cockroachdb#96630 Fixes cockroachdb#100783 Informs cockroachdb#99631 Informs cockroachdb#101614 Release note: None
Fix error handling and retries when restarting rangefeeds. A big difference between regular rangefeed, and mux rangefeed, is regular rangefeed has a dedicated go routine per range. This go routine is responsible for running a rangefeed, handling its errors, management of state pertaining to the RPC call (transport, retry information, routing information, etc), and restarting rangefeed with backoff as needed. Mux rangefeed, on the other hand, is not "loop" based. Prior to this PR, mux rangefeed, when it encountered a transient error, would loose a lot of the restart state mentioned above. For example, it would loose the transport information, so that the restart would run against the same node as before, resulting, potentially, in busy loops. Those busy loops (where the RPC call is restarted against the same node/replica that just experienced an error), would tend to make test flaky since they would take longer time to converge to the state expected by the tests (such as `TestDecommission`) test. This PR fixes this loss of state pertaining to single range restart by associating this state with the long lived `activeMuxRangeFeed` state. Fixes cockroachdb#96630 Fixes cockroachdb#100783 Informs cockroachdb#99631 Informs cockroachdb#101614 Release note: None
Fix error handling and retries when restarting rangefeeds. A big difference between regular rangefeed, and mux rangefeed, is regular rangefeed has a dedicated go routine per range. This go routine is responsible for running a rangefeed, handling its errors, management of state pertaining to the RPC call (transport, retry information, routing information, etc), and restarting rangefeed with backoff as needed. Mux rangefeed, on the other hand, is not "loop" based. Prior to this PR, mux rangefeed, when it encountered a transient error, would loose a lot of the restart state mentioned above. For example, it would loose the transport information, so that the restart would run against the same node as before, resulting, potentially, in busy loops. Those busy loops (where the RPC call is restarted against the same node/replica that just experienced an error), would tend to make test flaky since they would take longer time to converge to the state expected by the tests (such as `TestDecommission`) test. This PR fixes this loss of state pertaining to single range restart by associating this state with the long lived `activeMuxRangeFeed` state. Fixes cockroachdb#96630 Fixes cockroachdb#100783 Informs cockroachdb#99631 Informs cockroachdb#101614 Release note: None
Fix error handling and retries when restarting rangefeeds. A big difference between regular rangefeed, and mux rangefeed, is regular rangefeed has a dedicated go routine per range. This go routine is responsible for running a rangefeed, handling its errors, management of state pertaining to the RPC call (transport, retry information, routing information, etc), and restarting rangefeed with backoff as needed. Mux rangefeed, on the other hand, is not "loop" based. Prior to this PR, mux rangefeed, when it encountered a transient error, would loose a lot of the restart state mentioned above. For example, it would loose the transport information, so that the restart would run against the same node as before, resulting, potentially, in busy loops. Those busy loops (where the RPC call is restarted against the same node/replica that just experienced an error), would tend to make test flaky since they would take longer time to converge to the state expected by the tests (such as `TestDecommission`) test. This PR fixes this loss of state pertaining to single range restart by associating this state with the long lived `activeMuxRangeFeed` state. Fixes cockroachdb#96630 Fixes cockroachdb#100783 Informs cockroachdb#99631 Informs cockroachdb#101614 Release note: None
Fix error handling and retries when restarting rangefeeds. A big difference between regular rangefeed, and mux rangefeed, is regular rangefeed has a dedicated go routine per range. This go routine is responsible for running a rangefeed, handling its errors, management of state pertaining to the RPC call (transport, retry information, routing information, etc), and restarting rangefeed with backoff as needed. Mux rangefeed, on the other hand, is not "loop" based. Prior to this PR, mux rangefeed, when it encountered a transient error, would loose a lot of the restart state mentioned above. For example, it would loose the transport information, so that the restart would run against the same node as before, resulting, potentially, in busy loops. Those busy loops (where the RPC call is restarted against the same node/replica that just experienced an error), would tend to make test flaky since they would take longer time to converge to the state expected by the tests (such as `TestDecommission`) test. This PR fixes this loss of state pertaining to single range restart by associating this state with the long lived `activeMuxRangeFeed` state. Fixes cockroachdb#96630 Fixes cockroachdb#100783 Informs cockroachdb#99631 Informs cockroachdb#101614 Release note: None
Fix error handling and retries when restarting rangefeeds. A big difference between regular rangefeed, and mux rangefeed, is regular rangefeed has a dedicated go routine per range. This go routine is responsible for running a rangefeed, handling its errors, management of state pertaining to the RPC call (transport, retry information, routing information, etc), and restarting rangefeed with backoff as needed. Mux rangefeed, on the other hand, is not "loop" based. Prior to this PR, mux rangefeed, when it encountered a transient error, would loose a lot of the restart state mentioned above. For example, it would loose the transport information, so that the restart would run against the same node as before, resulting, potentially, in busy loops. Those busy loops (where the RPC call is restarted against the same node/replica that just experienced an error), would tend to make test flaky since they would take longer time to converge to the state expected by the tests (such as `TestDecommission`) test. This PR fixes this loss of state pertaining to single range restart by associating this state with the long lived `activeMuxRangeFeed` state. Fixes cockroachdb#96630 Fixes cockroachdb#100783 Informs cockroachdb#99631 Informs cockroachdb#101614 Release note: None
102094: kvcoord: Fix error handling and retries in mux rangefeed client r=miretskiy a=miretskiy Fix error handling and retries when restarting rangefeeds. A big difference between regular rangefeed, and mux rangefeed, is regular rangefeed has a dedicated go routine per range. This go routine is responsible for running a rangefeed, handling its errors, management of state pertaining to the RPC call (transport, retry information, routing information, etc), and restarting rangefeed with backoff as needed. Mux rangefeed, on the other hand, is not "loop" based. Prior to this PR, mux rangefeed, when it encountered a transient error, would loose a lot of the restart state mentioned above. For example, it would loose the transport information, so that the restart would run against the same node as before, resulting, potentially, in busy loops. Those busy loops (where the RPC call is restarted against the same node/replica that just experienced an error), would tend to make test flaky since they would take longer time to converge to the state expected by the tests (such as `TestDecommission`) test. This PR fixes this loss of state pertaining to single range restart by associating this state with the long lived `activeMuxRangeFeed` state. Fixes #96630 Fixes #100783 Informs #99631 Informs #101614 Release note: None Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Fix error handling and retries when restarting rangefeeds. A big difference between regular rangefeed, and mux rangefeed, is regular rangefeed has a dedicated go routine per range. This go routine is responsible for running a rangefeed, handling its errors, management of state pertaining to the RPC call (transport, retry information, routing information, etc), and restarting rangefeed with backoff as needed. Mux rangefeed, on the other hand, is not "loop" based. Prior to this PR, mux rangefeed, when it encountered a transient error, would loose a lot of the restart state mentioned above. For example, it would loose the transport information, so that the restart would run against the same node as before, resulting, potentially, in busy loops. Those busy loops (where the RPC call is restarted against the same node/replica that just experienced an error), would tend to make test flaky since they would take longer time to converge to the state expected by the tests (such as `TestDecommission`) test. This PR fixes this loss of state pertaining to single range restart by associating this state with the long lived `activeMuxRangeFeed` state. Fixes cockroachdb#96630 Fixes cockroachdb#100783 Informs cockroachdb#99631 Informs cockroachdb#101614 Release note: None
Fix error handling and retries when restarting rangefeeds. A big difference between regular rangefeed, and mux rangefeed, is regular rangefeed has a dedicated go routine per range. This go routine is responsible for running a rangefeed, handling its errors, management of state pertaining to the RPC call (transport, retry information, routing information, etc), and restarting rangefeed with backoff as needed. Mux rangefeed, on the other hand, is not "loop" based. Prior to this PR, mux rangefeed, when it encountered a transient error, would loose a lot of the restart state mentioned above. For example, it would loose the transport information, so that the restart would run against the same node as before, resulting, potentially, in busy loops. Those busy loops (where the RPC call is restarted against the same node/replica that just experienced an error), would tend to make test flaky since they would take longer time to converge to the state expected by the tests (such as `TestDecommission`) test. This PR fixes this loss of state pertaining to single range restart by associating this state with the long lived `activeMuxRangeFeed` state. Fixes cockroachdb#96630 Fixes cockroachdb#100783 Informs cockroachdb#99631 Informs cockroachdb#101614 Release note: None
kv/kvserver.TestDecommission failed with artifacts on release-22.1 @ 8816f14ae071d448d74c35e51c26690e91519f74:
Parameters:
TAGS=bazel,gss
Help
See also: How To Investigate a Go Test Failure (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-24248
The text was updated successfully, but these errors were encountered: