Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: fix use of canceled connection in init #89059

Merged
merged 1 commit into from
Sep 30, 2022
Merged

Conversation

tbg
Copy link
Member

@tbg tbg commented Sep 30, 2022

The implementation of ./cockroach init first dials up the node (in a
retry loop) and issues a Health RPC, to reduce the likelihood of
accidental double-init of the cluster in case of network issues.

However, this was trying to be too clever: if it managed to dial the
node and check its health (using a short-lived context), it would
then return that same connection to the caller for use in the Bootstrap
RPC. Unfortunately, that connection would sit on top of an rpc.Context
whose life was tied to a context1 that descended from one wrapped by
RunWithTimeout. In other words, the context would be cancelled by the
time the health check method returned.

This seems to not cause issues today, since the rpc.Context seems to
ignore this context cancellation. But in #88625, this suddenly became
a problem and now that I've looked at this code, might as wel fix it
regardless of whether #88625 is ever going to land.

No release note since today's code happens to work.

Release note: None

Footnotes

  1. https://github.com/cockroachdb/cockroach/blob/aecc58f125ac611f5793101cbd0323df569369db/pkg/cli/rpc_client.go#L46

The implementation of `./cockroach init` first dials up the node (in a
retry loop) and issues a `Health` RPC, to reduce the likelihood of
accidental double-init of the cluster in case of network issues.

However, this was trying to be too clever: if it managed to dial the
node and check its health (again, using a short-lived context), it would
then return that same connection to the caller for use in the Bootstrap
RPC. Unfortunately, that connection would sit on top of an `rpc.Context`
whose life was tied to a context[^1] that descended from one wrapped by
`RunWithTimeout`. In other words, the context would be cancelled by the
time the health check method returned.

This seems to not cause issues today, since the `rpc.Context` seems to
ignore this context cancellation. But in cockroachdb#88625, this suddenly became
a problem and now that I've looked at this code, might as wel fix it
regardless of whether cockroachdb#88625 is ever going to land.

No release note since today's code happens to work.

[^1]: https://github.com/cockroachdb/cockroach/blob/aecc58f125ac611f5793101cbd0323df569369db/pkg/cli/rpc_client.go#L46

Release note: None
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg marked this pull request as ready for review September 30, 2022 08:29
@tbg tbg requested a review from a team as a code owner September 30, 2022 08:29
@tbg tbg requested a review from erikgrinaker September 30, 2022 08:29
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

today's code happens to work

The best kind of code.

@tbg
Copy link
Member Author

tbg commented Sep 30, 2022

bors r=erikgrinaker
TFTR!

@craig
Copy link
Contributor

craig bot commented Sep 30, 2022

Build succeeded:

@craig craig bot merged commit aaca5ce into cockroachdb:master Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants