Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: Perform initial-heartbeat validation on GRPC reconnections #22518

Merged
merged 2 commits into from
Feb 9, 2018

Conversation

bdarnell
Copy link
Contributor

@bdarnell bdarnell commented Feb 8, 2018

GRPC will transparently reconnect when a connection fails, but if the
next process to use that port is not a part of the same cluster, this
leads to confusing errors and potential data corruption. (this is most
common in tests, but it can also occur in other situations).

This change disables grpc's automatic reconnections so that in the
event of a failed connection, we go through our full dialing process
including an initial heartbeat that validates certain parameters.

Fixes #20537

Release note (bug fix): Implement additional safeguards against RPC
connections between nodes that belong to different clusters.

@bdarnell bdarnell requested a review from a team February 8, 2018 18:54
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@petermattis
Copy link
Collaborator

:lgtm:


Review status: 0 of 4 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


pkg/rpc/context.go, line 428 at r1 (raw file):

}

// onlyOnceDialer implements the grpc.WithDialer interface but only

Pedantically, this isn't implementing an interface, but is used to provide a custom dialer for grpc.WithDialer.


Comments from Reviewable

@nvanbenschoten
Copy link
Member

Reviewed 1 of 3 files at r1.
Review status: 1 of 4 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


pkg/rpc/context.go, line 479 at r1 (raw file):

		redialChan: make(chan struct{}),
	}
	dialOpts = append(dialOpts, grpc.WithDialer(dialer.dial))

Do we need to set FailOnNonTempDialError as well?


Comments from Reviewable

@bdarnell
Copy link
Contributor Author

bdarnell commented Feb 8, 2018

Review status: 1 of 4 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


pkg/rpc/context.go, line 479 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we need to set FailOnNonTempDialError as well?

It doesn't really make any difference; it'll fail either way because grpc only has one address to try.


Comments from Reviewable

GRPC will transparently reconnect when a connection fails, but if the
next process to use that port is not a part of the same cluster, this
leads to confusing errors and potential data corruption. (this is most
common in tests, but it can also occur in other situations).

This change disables grpc's automatic reconnections so that in the
event of a failed connection, we go through our full dialing process
including an initial heartbeat that validates certain parameters.

Fixes cockroachdb#20537

Release note (bug fix): Implement additional safeguards against RPC
connections between nodes that belong to different clusters.
This reverts commit 608292b.

This message is no longer needed or accurate now that we always
validate cluster IDs.

Fixes cockroachdb#14231

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

core: RPC re-connections are not validated
4 participants