kv: move node status range to system RPC class #111239
Labels
A-kv
Anything in KV that doesn't belong in a more specific category.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
P-2
Issues/test failures with a fix SLA of 3 months
T-kv
KV Team
We currently have two ranges that qualify for the system RPC class, which provides isolation from networking congestion on data-plane RPC connections:
cockroach/pkg/rpc/connection_class.go
Lines 61 to 64 in 967fe5e
This is important in the context of network saturation like that described in #111238.
In a support case, we saw that the range spanning from
/System/NodeLivenessMax
to/System/tsd
(ID 4, created by these static split points) was impacted by traffic on other ranges. This prevented nodes from restarting because a restart writes to a node status key. Had the range been part of the system RPC class, it would have been fine and the raft transport saturation on the data ranges would have been less impactful.We should move this range to the system RPC class, as it stores a few important keys and should otherwise see little traffic. Doing so will require some changes to
systemClassKeyPrefixes
, as unlike the previous two ranges, this one is not part ofNoSplitSpans
, so it can technically split.Jira issue: CRDB-31831
Epic CRDB-32846
The text was updated successfully, but these errors were encountered: