Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: move node status range to system RPC class #111239

Closed
nvanbenschoten opened this issue Sep 25, 2023 · 0 comments · Fixed by #116958
Closed

kv: move node status range to system RPC class #111239

nvanbenschoten opened this issue Sep 25, 2023 · 0 comments · Fixed by #116958
Assignees
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-2 Issues/test failures with a fix SLA of 3 months T-kv KV Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Sep 25, 2023

We currently have two ranges that qualify for the system RPC class, which provides isolation from networking congestion on data-plane RPC connections:

var systemClassKeyPrefixes = []roachpb.RKey{
roachpb.RKey(keys.Meta1Prefix),
roachpb.RKey(keys.NodeLivenessPrefix),
}

This is important in the context of network saturation like that described in #111238.

In a support case, we saw that the range spanning from /System/NodeLivenessMax to /System/tsd (ID 4, created by these static split points) was impacted by traffic on other ranges. This prevented nodes from restarting because a restart writes to a node status key. Had the range been part of the system RPC class, it would have been fine and the raft transport saturation on the data ranges would have been less impactful.

We should move this range to the system RPC class, as it stores a few important keys and should otherwise see little traffic. Doing so will require some changes to systemClassKeyPrefixes, as unlike the previous two ranges, this one is not part of NoSplitSpans, so it can technically split.

Jira issue: CRDB-31831

Epic CRDB-32846

@nvanbenschoten nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs A-kv Anything in KV that doesn't belong in a more specific category. labels Sep 25, 2023
@blathers-crl blathers-crl bot added the T-kv KV Team label Sep 25, 2023
@nvanbenschoten nvanbenschoten added the P-2 Issues/test failures with a fix SLA of 3 months label Jan 11, 2024
cockroach-dev-inf pushed a commit that referenced this issue Jan 12, 2024
Move all the ranges between /Min and /System/tsd to
use the default RPC class. This will allow for
isolation from network congestion for all system
ranges, which crucial for the stability of the system.

Fixes: #111239

Release note: None
craig bot pushed a commit that referenced this issue Feb 3, 2024
116958: rpc: move system ranges to system RPC class r=lunevalex a=lunevalex

Move all the ranges between /Min and /System/tsd to use the default RPC class. This will allow for
isolation from network congestion for all system
ranges, which crucial for the stability of the system.

Fixes: #111239

Release note: None

118555: workflows: run UI lint and test in experimental github actions build r=rail a=rickystewart

Epic: [CRDB-8308](https://cockroachlabs.atlassian.net/browse/CRDB-8308)
Release note: None

118581: roachtest: stop ignoring activerecord failures r=rafiss a=rafiss

The adapter has been stabilized, so we should enable this test again.

fixes #108938
Release note: None

118659: tests: use `test.Pool` instead of `Pool` r=rail a=rickystewart

This tells Bazel to use the pool only for test actions instead of the compile action associated with each test.

Epic: CRDB-8308
Release note: None

118673: kv: mark kvnemesis tests as "large" sized r=nvanbenschoten a=arulajmani

We've recently seen these time out exclusively on eng flow. In all those instances, we can see the test is making some progress from the stack traces -- it's slow though. We mark KVNemesis tests as large, which in turn bumps their timeout in CI.

Closes #118624
Closes #118005

Release note: None

118675: ui: remove warning when auto refresh enable r=maryliag a=maryliag

The warning being displayed about old active executions was not bein properly removed when the auto refresh was turned back on.
This commit fixes this for both Statements and Transactions pages, on Active Executions.

Fixes CRDB-35837

https://www.loom.com/share/76b57eba17ab44758fe81f178f07fecd

Release note (ui change): Properly remove warning of old date on Active Executions when auto refresh is enabled.

Co-authored-by: Alex Lunev <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Arul Ajmani <[email protected]>
Co-authored-by: maryliag <[email protected]>
@craig craig bot closed this as completed in fa8b201 Feb 3, 2024
wenyihu6 pushed a commit to wenyihu6/cockroach that referenced this issue Feb 21, 2024
Move all the ranges between /Min and /System/tsd to
use the default RPC class. This will allow for
isolation from network congestion for all system
ranges, which crucial for the stability of the system.

Fixes: cockroachdb#111239

Release note: None
@github-project-automation github-project-automation bot moved this to Closed in KV Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-2 Issues/test failures with a fix SLA of 3 months T-kv KV Team
Projects
No open projects
Status: Closed
Development

Successfully merging a pull request may close this issue.

2 participants