Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: deflake TestReplicaStateMachineChangeReplicas #69730

Merged

Conversation

nvanbenschoten
Copy link
Member

Fixes #69229.

The test was reaching below Raft and adding a peer on a node that does not
exist. With a large enough pause between r.raftMu.Unlock() and testContext
teardown, it was possible for the replica to be ticked, which would cause it to
attempt to reach out and heartbeat this node. This could cause issues. In fact,
the testContext uses a dummy RaftTransport, so any attempt to send traffic
to another Raft peer would panic.

This commit resolves the issue by giving testContext a slightly more real
version of a RaftTransport (i.e. one configured with a nodedialer). This feels a
bit wrong, as a testContext should really never need to use a RaftTransport, but
given how little impact this has on the dependency structure of a testContext
and given that the testContext was already doing something similar with gossip,
I think this is fine.

Release justification: test fix.

Fixes cockroachdb#69229.

The test was reaching below Raft and adding a peer on a node that does not
exist. With a large enough pause between `r.raftMu.Unlock()` and `testContext`
teardown, it was possible for the replica to be ticked, which would cause it to
attempt to reach out and heartbeat this node. This could cause issues. In fact,
the `testContext` uses a dummy `RaftTransport`, so any attempt to send traffic
to another Raft peer would panic.

This commit resolves the issue by giving testContext a slightly more real
version of a RaftTransport (i.e. one configured with a nodedialer). This feels a
bit wrong, as a testContext should really never need to use a RaftTransport, but
given how little impact this has on the dependency structure of a testContext
and given that the testContext was already doing something similar with gossip,
I think this is fine.

Release justification: test fix.
@nvanbenschoten nvanbenschoten requested a review from tbg September 2, 2021 05:52
@nvanbenschoten nvanbenschoten requested a review from a team as a code owner September 2, 2021 05:52
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 3, 2021

Build succeeded:

@craig craig bot merged commit c5de0db into cockroachdb:master Sep 3, 2021
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/deflake69229 branch September 7, 2021 16:35
tbg added a commit to tbg/cockroach that referenced this pull request Dec 8, 2022
Some tests do add peers and so we'll see some use of the
raft transport. This doesn't have to work properly, just
not crash, which this achieves.

This is essentially a re-do of cockroachdb#69730 which was lost in cockroachdb#72383.

Release note: None
tbg added a commit that referenced this pull request Dec 20, 2022
Some tests do add peers and so we'll see some use of the
raft transport. This doesn't have to work properly, just
not crash, which this achieves.

This is essentially a re-do of #69730 which was lost in #72383.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv/kvserver: TestReplicaStateMachineChangeReplicas failed
3 participants