Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: use the loopback conn also for GRPCDialOptions #103764

Merged
merged 1 commit into from
May 23, 2023

Conversation

knz
Copy link
Contributor

@knz knz commented May 23, 2023

Fixes #103762.
Fixes #99261.
Fixes #103692.
Epic: CRDB-28893

For context, rpc.GRPCDialOptions is used in two cases:

  • when connecting to other nodes as specified by the --join flag.
  • in the grpc-gateway code, to route incoming HTTP requests to the RPC subsystem.

The first one nearly always targets remotes nodes. The second one always targets the local node (it's a loopback connection).

Prior to this patch, the 2 callers to rpc.GRPCDialOptions would be served the regular "remote network conn" dial options unconditionally, including the backoff, only-once-dialer and other parameters suitable to connect to other nodes remotely.

While this choice is suitable for the --join logic, it's not suitable for the grpc-gateway loopback conn. In that case, we want to avoid all the network intelligence and especially avoid the only-once-dialer and circuit breaker.

This patch ensures that grpc-gateway receives the loopback parameters properly.

Release note (bug fix): A bug was fixed whereby under high CPU load,
HTTP requests to certain API endpoints (e.g. the health endpoint)
could start failing and then never succeed again until the node was
restarted. This bug had been introduced in v23.1.

@knz knz added the backport-23.1.x Flags PRs that need to be backported to 23.1 label May 23, 2023
@knz knz requested a review from tbg May 23, 2023 09:03
@knz knz requested a review from a team as a code owner May 23, 2023 09:03
@blathers-crl
Copy link

blathers-crl bot commented May 23, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor Author

knz commented May 23, 2023

NB: this fix doesn't work, there's a remaining bug. I'm looking into it.

@knz knz marked this pull request as draft May 23, 2023 09:06
@knz knz force-pushed the 20230523-rpc-fix branch from 8eacf34 to 5da5f8b Compare May 23, 2023 10:10
@knz knz marked this pull request as ready for review May 23, 2023 10:10
@knz knz force-pushed the 20230523-rpc-fix branch from 5da5f8b to 3571f52 Compare May 23, 2023 10:13
@knz
Copy link
Contributor Author

knz commented May 23, 2023

ok this is ready

For context, `rpc.GRPCDialOptions` is used in two cases:

- when connecting to other nodes as specified by the `--join` flag.
- in the grpc-gateway code, to route incoming HTTP requests to the RPC
  subsystem.

The first one nearly always targets remotes nodes. The second one
always targets the local node (it's a loopback connection).

Prior to this patch, the 2 callers to `rpc.GRPCDialOptions` would be
served the regular "remote network conn" dial options unconditionally,
including the backoff, only-once-dialer and other parameters suitable
to connect to other nodes remotely.

While this choice is suitable for the `--join` logic, it's not
suitable for the grpc-gateway loopback conn. In that case, we want to
avoid all the network intelligence and especially avoid the
only-once-dialer and circuit breaker.

This patch ensures that grpc-gateway receives the loopback parameters properly.

Release note (bug fix): A bug was fixed whereby under high CPU load,
HTTP requests to certain API endpoints (e.g. the health endpoint)
could start failing and then never succeed again until the node was
restarted. This bug had been introduced in v23.1.
@knz knz force-pushed the 20230523-rpc-fix branch from 3571f52 to 877111d Compare May 23, 2023 10:15
@knz
Copy link
Contributor Author

knz commented May 23, 2023

bors r=tbg

@craig
Copy link
Contributor

craig bot commented May 23, 2023

Build succeeded:

@craig craig bot merged commit 741c91b into cockroachdb:master May 23, 2023
@knz knz deleted the 20230523-rpc-fix branch May 23, 2023 13:27
knz added a commit to knz/cockroach that referenced this pull request Jun 27, 2023
We have fixed the issue that caused the skip in cockroachdb#103764.

Release note: None
craig bot pushed a commit that referenced this pull request Jun 27, 2023
105629: server: unskip TestStatusEngineStatsJson r=rafiss a=knz

Fixes #99261.

We have fixed the issue that caused the skip in #103764.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Jun 27, 2023
We have fixed the issue that caused the skip in #103764.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
3 participants