Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testcluster: fix dial check on restart #111199

Merged

Conversation

aliher1911
Copy link
Contributor

Previously server restart was incorectly dialing loopback instead of other nodes to reset circuit breakers.
This commit fixes it to dial from existing nodes to restarted one.

Epic: none
Fixes: #111163

Release note: None

Previously server restart was incorectly dialing loopback instead of
other nodes to reset circuit breakers.
This commit fixes it to dial from existing nodes to restarted one.

Epic: none
Fixes: cockroachdb#111163

Release note: None
@aliher1911 aliher1911 requested a review from a team September 25, 2023 11:57
@aliher1911 aliher1911 self-assigned this Sep 25, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker
Copy link
Contributor

Is this relevant for backports?

Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks.

@@ -1787,6 +1787,7 @@ func (tc *TestCluster) RestartServerWithInspect(
// node. This is useful to avoid flakes: the newly restarted node is now on a
// different port, and a cycle of gossip is necessary to make all other nodes
// aware.
id := s.StorageLayer().NodeID()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just s.NodeID()? Is there some kind of mandate to type the "layer" explicitly? CC @knz to help with the answer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to be explicit about what we are trying to use StorageLayer (i.e. underlying raw KV) or tenant layer. I think it makes little difference here as we manipulate servers themselves, but maybe having intent expressed explicitly is better. Then we can deprecate those methods on the level above where server could have no storage and no nodeid?

@aliher1911
Copy link
Contributor Author

aliher1911 commented Sep 25, 2023

There's no bug in 23.1, it was using correct servers for source and destination. Looks like refactor to introduce StorageLayer() broke this.
This is the culprit: #107866 I imagine it was hard to spot 2 character change is such a massive mechanical PR.

@aliher1911
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 25, 2023

Build succeeded:

@craig craig bot merged commit 9757421 into cockroachdb:master Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv/kvserver/loqrecovery: TestRejectBadVersionApplication failed
4 participants