-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: introduce cpu rebalancing #96127
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 20 of 20 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @kvoli)
-- commits
line 33 at r3:
", however, it will target"
-- commits
line 51 at r3:
Release note (ops change):
or this will trip up automation.
pkg/kv/kvserver/rebalance_objective.go
line 32 at r3 (raw file):
// qps: which is the original default setting and looks at the number of batch // // requests, on a range and store.
nit: something's off with this indentation.
pkg/kv/kvserver/rebalance_objective.go
line 37 at r3 (raw file):
type LBRebalancingObjective int64 const (
This feels like the right place to add some discussion about the details of each of these approaches. For instance, we could discuss weighted batch requests for QPS-based rebalancing. We could also discuss how CPU is attributed to a given replica and how we decided between the two alternative approaches presented in https://docs.google.com/document/d/1_FJvTV0y3q0nOC-1xjfoyPZfo1yEV2oMriQek7LT7dg/edit#heading=h.g5iv0fx0qnlf. Also, you could mention the details of how multi-store rebalancing works in each case.
Let's not let all of your hard-earned learning be lost to the sands of time.
pkg/kv/kvserver/rebalance_objective.go
line 68 at r3 (raw file):
// uniquely to a load.Dimension. However, in the future it is forseeable that // LBRebalancingObjective could be a value that encompassese many different // dimension within a single objective e.g. bytes written, cpu usage and
"dimensions"
pkg/kv/kvserver/rebalance_objective.go
line 85 at r3 (raw file):
// of the cluster. It is possible that the cluster setting objective may not be // the objective returned, when the cluster environment is unsupported or mixed // mixed versions exist.
"mixed mixed"
pkg/kv/kvserver/rebalance_objective.go
line 86 at r3 (raw file):
// the objective returned, when the cluster environment is unsupported or mixed // mixed versions exist. type RebalanceObjectiveProvider interface {
Does this type need to be exported? Same question for ResolveLBRebalancingObjective
.
pkg/kv/kvserver/rebalance_objective.go
line 92 at r3 (raw file):
// RebalanceObjectiveManager implements the RebalanceObjectiveProvider // interface and registering a callback at creation time, that will be called
"registers"
pkg/kv/kvserver/rebalance_objective.go
line 165 at r3 (raw file):
// in such cases we cannot balance another objective since the data may not // exist. Fall back to QPS balancing. if !st.Version.IsActive(ctx, clusterversion.V23_1AllocatorCPUBalancing) {
Is there a risk of some nodes in the system recognizing that the cluster has been fully upgraded to v23.1 and others not? The propagation of this information is asynchronous and there's no bound on the delay. Do you have a sense of how bad the thrashing could be if two nodes in the system are using different objectives?
pkg/kv/kvserver/rebalance_objective.go
line 170 at r3 (raw file):
} // When the cpu timekeeping utility is unsupported on this aarch, the cpu // usage cannot be gathered. Fall back to QPS balancing.
What if another node in the system does not support grunning? Will that lead to different nodes using different objectives? We may not need to handle that case well, but we should make sure that we don't handle it catastrophically.
pkg/kv/kvserver/store.go
line 1225 at r3 (raw file):
s.rebalanceObjManager = newRebalanceObjectiveManager(ctx, s.cfg.Settings, func(ctx context.Context, obj LBRebalancingObjective) { s.VisitReplicas(func(r *Replica) bool {
nit: can we name the return arg so that it's clear what it signifies?
pkg/kv/kvserver/store_rebalancer.go
line 268 at r3 (raw file):
hottestRanges := sr.replicaRankings.TopLoad() options := sr.scorerOptions(ctx) rctx := sr.NewRebalanceContext(ctx, options, hottestRanges, sr.RebalanceMode())
mode
?
RebalanceMode
and RebalanceObjective
are both volatile. They can change each time they are checked. Do we need to be any more deliberate than we already are about ensuring that a single pass through the allocator only acts on a single mode/objective?
pkg/kv/kvserver/allocator/range_usage_info.go
line 45 at r2 (raw file):
} // TransferImpact returns the impact of transferring the lease for the range,
"impact" is ambiguous. Is it implied here that the outgoing leaseholder will see a reduction in its load by the returned amount and the incoming leaseholder will see an increase in its load by the returned amount? If so, consider spelling that out in the comment.
pkg/kv/kvserver/allocator/allocatorimpl/threshold.go
line 129 at r3 (raw file):
// SetAllDims returns a load vector with all dimensions filled in with the // value given. func SetAllDims(v float64) load.Load {
nit: WithAllDims
might be a bit more reflective of what the function does. I would expect a function starting with Set
to accept an object to set the value on. Feel free to ignore.
pkg/kv/kvserver/allocator/storepool/store_pool.go
line 645 at r3 (raw file):
if toDetail := sp.GetStoreDetailLocked(old.StoreID); toDetail != nil { toDetail.Desc.Capacity.RangeCount-- toDetail.Desc.Capacity.CPUPerSecond -= rangeUsageInfo.RaftCPUNanosPerSecond
Should this saturate at 0 instead of going negative as well?
c8cf548
to
178aa23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @nvanbenschoten)
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
", however, it will target"
Updated.
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Release note (ops change):
or this will trip up automation.
Updated
pkg/kv/kvserver/rebalance_objective.go
line 32 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: something's off with this indentation.
Updated
pkg/kv/kvserver/rebalance_objective.go
line 37 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This feels like the right place to add some discussion about the details of each of these approaches. For instance, we could discuss weighted batch requests for QPS-based rebalancing. We could also discuss how CPU is attributed to a given replica and how we decided between the two alternative approaches presented in https://docs.google.com/document/d/1_FJvTV0y3q0nOC-1xjfoyPZfo1yEV2oMriQek7LT7dg/edit#heading=h.g5iv0fx0qnlf. Also, you could mention the details of how multi-store rebalancing works in each case.
Let's not let all of your hard-earned learning be lost to the sands of time.
Updated.
pkg/kv/kvserver/rebalance_objective.go
line 68 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"dimensions"
Updated
pkg/kv/kvserver/rebalance_objective.go
line 85 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"mixed mixed"
Updated
pkg/kv/kvserver/rebalance_objective.go
line 86 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Does this type need to be exported? Same question for
ResolveLBRebalancingObjective
.
Exported for use in the simulator shortly. It is passed as an argument to create a new store rebalancer.
pkg/kv/kvserver/rebalance_objective.go
line 92 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"registers"
Updated.
pkg/kv/kvserver/rebalance_objective.go
line 165 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is there a risk of some nodes in the system recognizing that the cluster has been fully upgraded to v23.1 and others not? The propagation of this information is asynchronous and there's no bound on the delay. Do you have a sense of how bad the thrashing could be if two nodes in the system are using different objectives?
There's a risk, thrashing could occur between the new and old nodes if the different versions lasted >=2 the store rebalancer interval (1 min default).
Thrashing would only occur however if the values for the objectives are fairly different - like allocbench/nodes=7/cpu=8/kv/r=95/ops=skew
. Even then, it's not guaranteed that thrashing occurs but likely.
Are there alternatives to this approach? I don't consider this a massive risk but curious if there's better approaches that eliminate the risk.
pkg/kv/kvserver/rebalance_objective.go
line 170 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What if another node in the system does not support grunning? Will that lead to different nodes using different objectives? We may not need to handle that case well, but we should make sure that we don't handle it catastrophically.
It will lead to nodes using different objectives. However the impact is much worse, the stores on node's that don't support grunning will gossip CPU=0
and every other store will move load onto it.
I've reworked a few things to now gossip -1 CPU when the store is on a node that doesn't support cpu attribution. ResolveRebalancingObjective
now checks the gossipped store descriptors and will revert to QPS if any of them have the CPUPerSecond set to -1. I want avoid adding another field to gossip if possible. Overloading the value itself does complicate things but seems logical given that it would otherwise be 0 and a negative value should never occur otherwise.
pkg/kv/kvserver/store.go
line 1225 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: can we name the return arg so that it's clear what it signifies?
Updated.
pkg/kv/kvserver/store_rebalancer.go
line 268 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
mode
?
RebalanceMode
andRebalanceObjective
are both volatile. They can change each time they are checked. Do we need to be any more deliberate than we already are about ensuring that a single pass through the allocator only acts on a single mode/objective?
Updated this to only instantiate in the outer process loop, which feeds the argument to the scorer options and acts as the source of truth.
pkg/kv/kvserver/allocator/range_usage_info.go
line 45 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"impact" is ambiguous. Is it implied here that the outgoing leaseholder will see a reduction in its load by the returned amount and the incoming leaseholder will see an increase in its load by the returned amount? If so, consider spelling that out in the comment.
Updated to spell this out.
pkg/kv/kvserver/allocator/allocatorimpl/threshold.go
line 129 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit:
WithAllDims
might be a bit more reflective of what the function does. I would expect a function starting withSet
to accept an object to set the value on. Feel free to ignore.
I agree, updated.
pkg/kv/kvserver/allocator/storepool/store_pool.go
line 645 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should this saturate at 0 instead of going negative as well?
This should saturate at 0 too. Updated.
178aa23
to
d8bf342
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r5, 20 of 20 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @kvoli)
pkg/kv/kvserver/rebalance_objective.go
line 165 at r3 (raw file):
Previously, kvoli (Austen) wrote…
There's a risk, thrashing could occur between the new and old nodes if the different versions lasted >=2 the store rebalancer interval (1 min default).
Thrashing would only occur however if the values for the objectives are fairly different - like
allocbench/nodes=7/cpu=8/kv/r=95/ops=skew
. Even then, it's not guaranteed that thrashing occurs but likely.Are there alternatives to this approach? I don't consider this a massive risk but curious if there's better approaches that eliminate the risk.
One approach that could be workable here is to include each node's rebalancing objective in its gossiped store descriptor or store capacity. Nodes would then only rebalance to other nodes that are still using the same rebalance objective. Without any synchronous coordination to verify the objective before rebalancing (think causality passing), this is still racy, but I think it would have less chance of thrashing. Once either node noticed that the other was using a different objective than it, it would stop rebalancing and wait for the objectives to converge. Or, if they never converged (see below with the grunning question), this would prevent rebalancing between incompatible nodes entirely.
EDIT: This is similar to what you're doing now with CPUPerSecond == -1
. What you're doing there makes sense.
pkg/kv/kvserver/rebalance_objective.go
line 52 at r6 (raw file):
// impact of an action by using the QPS of the leaseholder replica invovled // e.g. the impact of lease transfers on the stores invovled is // +leaseholder replica QPS on the store that receives the lease and
The inclusion of the "replica" here but not in the next line looks intentional. Is it?
pkg/kv/kvserver/rebalance_objective.go
line 64 at r6 (raw file):
// LBRebalancingCPU is a rebalance objective that aims balances the store // CPU usage. The store CPU usage is calculated as the sum of replica's cpu
replicas'
pkg/kv/kvserver/rebalance_objective.go
line 69 at r6 (raw file):
the behavior doesn't change
This could mean a few different things. Could you call out that we try to balance CPU across stores on the same node?
pkg/kv/kvserver/rebalance_objective.go
line 77 at r6 (raw file):
and balance each stores' process usage
Consider adding something like ", using the measured replica cpu usage only to determine which replica to rebalance, but not when to rebalance or who to rebalance to". Or something along those lines to better contrast the two approaches.
pkg/kv/kvserver/rebalance_objective.go
line 80 at r6 (raw file):
// process usage. This approach benefits from observing the "true" cpu // usage, rather than just the sum of replica's usage. However, unlike the // implemented approach, the estmated impact of actions was less reliable
"estimated"
pkg/kv/kvserver/store.go
line 2612 at r6 (raw file):
// balancing objective is permitted. If this is not updated, the cpu per // second will be zero and other stores will likely begin rebalancing // towards this store as it will appear underfull.
This is probably not worth it, but if there is a concern about forgetting that -1 is a sentinel value in some code paths and rebalancing towards a node with !grunning.Supported()
, we could also make math.MaxInt64
the sentinel value.
pkg/kv/kvserver/allocator/storepool/store_pool.go
line 691 at r6 (raw file):
// When CPU attribution is unsupported, the store will set the // CPUPerSecond of its store capacity to be -1. if fromDetail.Desc.Capacity.CPUPerSecond >= 0 {
These all feel a bit error-prone, but I don't have a good suggestion for how to make it harder to forget to check this.
d8bf342
to
0b0dfd5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @nvanbenschoten)
pkg/kv/kvserver/rebalance_objective.go
line 52 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The inclusion of the "replica" here but not in the next line looks intentional. Is it?
Somewhat intentional but now that I think about it, it doesn't seem right to make any distinction. Updated to be leaseholder replica for both.
pkg/kv/kvserver/rebalance_objective.go
line 64 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
replicas'
updated
pkg/kv/kvserver/rebalance_objective.go
line 69 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
the behavior doesn't change
This could mean a few different things. Could you call out that we try to balance CPU across stores on the same node?
Updated with an example.
pkg/kv/kvserver/rebalance_objective.go
line 77 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
and balance each stores' process usage
Consider adding something like ", using the measured replica cpu usage only to determine which replica to rebalance, but not when to rebalance or who to rebalance to". Or something along those lines to better contrast the two approaches.
Added
pkg/kv/kvserver/rebalance_objective.go
line 80 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"estimated"
Updated
pkg/kv/kvserver/store.go
line 2612 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This is probably not worth it, but if there is a concern about forgetting that -1 is a sentinel value in some code paths and rebalancing towards a node with
!grunning.Supported()
, we could also makemath.MaxInt64
the sentinel value.
I prefer the negative value because it feels like a pattern we could follow if we bake it into the store pool updates - see below comment.
pkg/kv/kvserver/allocator/storepool/store_pool.go
line 691 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
These all feel a bit error-prone, but I don't have a good suggestion for how to make it harder to forget to check this.
There's probably some work that could go into the storepool either this release or early next release to make this easier to handle. Something akin to "virtualizing" the store descriptors and keeping the underlying gossiped descriptor as a physical value of sorts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @nvanbenschoten)
pkg/kv/kvserver/rebalance_objective.go
line 186 at r9 (raw file):
// Objective returns the current rebalance objective. func (rom *RebalanceObjectiveManager) Objective(ctx context.Context) LBRebalancingObjective { rom.maybeUpdateRebalanceObjective(ctx)
I added this maybeUpdateRebalanceObjective so that the storepool would be checked for CPU=-1. The check is now more expensive. The check being more expensive isn't an issue for CPU balancing, since it is only checked when calling Capacity
and every time the store rebalancer wakes up. The check being more expensive is an issue for CPU load splitting as that checks the RebalanceObjective
more frequently, including every request to the replica.
I'm going to change the design of how we get the information. I'll ping when I've updated.
c9db908
to
66d42dd
Compare
a4e30d4
to
130c1d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r11, 22 of 22 files at r12, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist and @kvoli)
pkg/kv/kvserver/rebalance_objective.go
line 69 at r6 (raw file):
Previously, kvoli (Austen) wrote…
Updated with an example.
Thanks, the added example is helpful.
pkg/kv/kvserver/rebalance_objective.go
line 202 at r12 (raw file):
capacityChangeNotifier.SetOnCapacityChange( func(storeID roachpb.StoreID, old, cur roachpb.StoreCapacity) { if old.CPUPerSecond < 0 && cur.CPUPerSecond >= 0 ||
This might be easier to understand if written as if (old.CPUPerSecond < 0) != (new.CPUPerSecond < 0) {
130c1d5
to
a65f21f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andrewbaptist and @nvanbenschoten)
pkg/kv/kvserver/rebalance_objective.go
line 202 at r12 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This might be easier to understand if written as
if (old.CPUPerSecond < 0) != (new.CPUPerSecond < 0) {
That is easier to understand than what I had, updated to this.
Previously `Queries` was hardcoded as the dimension to use when creating test ranges for use in store rebalancer tests. This patch enables passing in any `dimension`. Release note: None
Previously, when estimating the impact a lease transfer would have, we would not differentiate between rebalance/transfer. This commit adds a utility method `TransferImpact` to `RangeUsageInfo` which is now used when making estimations about lease transers. Currently, this is identical to `Load`, which was previously used instead for transfers. Release note: None
This patch allows the store rebalancer to use CPU in place of QPS when balancing load on a cluster. This patch adds `cpu` as an option with the cluster setting: `kv.allocator.load_based_rebalancing.objective` When set to `cpu`, rather than `qps`. The store rebalancer will perform a mostly identical function, however, it will target balancing the sum of all replica's cpu time on each store, rather than qps. The default remains as `qps` here. Similar to QPS, the rebalance threshold can be set to allow controlling the range above and below the mean store CPU is considered imbalanced, either overfull or underfull respectively: `kv.allocator.cpu_rebalance_threshold`: 0.1 In order to manage with mixed versions during upgrade and some architectures not supporting the cpu sampling method, a rebalance objective manager is introduced in `rebalance_objective.go`. The manager mediates access to the rebalance objective and overwrites it in cases where the objective set in the cluster setting cannot be supported. resolves: cockroachdb#95380 Release note (ops change): Add option to balance cpu time (cpu) instead of queries per second (qps) among stores in a cluster. This is done by setting `kv.allocator.load_based_rebalancing.objective='cpu'`. `kv.allocator.cpu_rebalance_threshold` is also added, similar to `kv.allocator.qps_rebalance_threshold` to control the target range for store cpu above and below the cluster mean.
a65f21f
to
c28ed6b
Compare
TYFTR |
bors r=nvanbenschoten |
Build succeeded: |
96128: kvserver: introduce cpu load based splits r=nvanbenschoten a=kvoli This PR adds the ability to perform load based splitting with replica cpu usage rather than queries per second. Load based splitting now will use either cpu or qps for deciding split points, depending on the cluster setting `kv.allocator.load_based_rebalancing.objective`. When set to `qps`, qps is used in deciding split points and when splitting should occur; similarly, `cpu` means that request cpu against the leaseholder replica is to decide split points. The split threshold when using `cpu` is the cluster setting `kv.range_split.load_cpu_threshold` which defaults to `250ms` of cpu time per second, i.e. a replica using 1/4 processor of a machine consistently. The merge queue uses the load based splitter to make decisions on whether to merge two adjacent ranges due to low load. This commit also updates the merge queue to be consistent with the load based splitter signal. When switching between `qps` and `cpu`, the load based splitter for every replica is reset to avoid spurious results. ![image](https://user-images.githubusercontent.com/39606633/215580650-b12ff509-5cf5-4ffa-880d-8387e2ef0afa.png) resolves: #95377 resolves: #83519 depends on: #96127 Release note (ops change): Load based splitter now supports using request cpu usage to split ranges. This is introduced with the previous cluster setting `kv.allocator.load_based_rebalancing.objective`, which when set to `cpu`, will use request cpu usage. The threshold above which CPU usage of a range is considered for splitting is defined in the cluster setting `kv.range_split.load_cpu_threshold`, which has a default value of `250ms`. 96129: bazel: is_dev_linux config setting should be false when cross_flag is r=rickystewart a=srosenberg true When cross-compiling on a gce worker and invoking bazel directly, build fails with "cannot find -lresolv_wrapper", owing to the fact that is_dev_linux is not mutually exclusive of cross_linux. That is, when both configs are active "-lresolv_wrapper" is passed to clinkopts in pkg/ccl/gssapiccl/BUILD.bazel; the cross-compiler doesn't have this lib nor does it need it. Note, when using ./dev instead of bazel, the above issue is side-stepped because the dev wrapper invokes bazel inside docker which ignores ~/.bazelrc. The workaround is to make is_dev_linux false when cross_flag is true. Epic: none Release note: None 96431: ui: databases shows partial results for size limit error r=j82w a=j82w The databases page displays partial results instead of just showing an error message. Sorting is disabled if there are more than 2 pages of results which is currently configured to 40dbs. This still allows most user to use sort functionality, but prevents large customers from breaking when it would need to do a network call per a database. The database details are now loaded on demand for the first page only. Previously a network call was done for all databases which would result in 2k network calls. It now only loads the page of details the user is looking at. part of: #94333 https://www.loom.com/share/31b213b2f1764d0f9868bd967b9388b8 Release note: none Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Stan Rosenberg <[email protected]> Co-authored-by: j82w <[email protected]>
In cockroachdb#96127 we added the option to load balance replica CPU instead of QPS across stores in a cluster. It is desirable to view the signal being controlled for rebalancing in the replication dashboard, similar to QPS. This commit adds the `rebalancing.cpunanospersecond` metric to the replication metrics dashboard. Resolves: cockroachdb#98109 Release note (ui change): `rebalancing.cpunanospersecond` is now included in the replication metrics dashboard.
In cockroachdb#96127 we added the option to load balance replica CPU instead of QPS across stores in a cluster. It is desirable to view the signal being controlled for rebalancing in the replication dashboard, similar to QPS. This commit adds the `rebalancing.cpunanospersecond` metric to the replication metrics dashboard. Resolves: cockroachdb#98109 Release note (ui change): `rebalancing.cpunanospersecond` is now included in the replication metrics dashboard.
97717: multitenant: add AdminUnsplitRequest capability r=knz a=ecwall Fixes #97716 1) Add a new `tenantcapabilitiespb.CanAdminUnsplit` capability. 2) Check capability in `Authorizer.HasCapabilityForBatch`. 3) Remove `ExecutorConfig.RequireSystemTenant` call from `execFactory.ConstructAlterTableUnsplit`, `execFactory.ConstructAlterTableUnsplitAll`. Release note: None 98250: kvserver: add minimum cpu lb split threshold r=andrewbaptist a=kvoli Previously, `kv.range_split.load_cpu_threshold` had no minimum setting value. It is undesirable to allow users to set this setting to low as excessive splitting may occur. `kv.range_split.load_cpu_threshold` now has a minimum setting value of `10ms`. See #96869 for additional context on the threshold. Resolves: #98107 Release note (ops change): `kv.range_split.load_cpu_threshold` now has a minimum setting value of `10ms`. 98270: dashboards: add replica cpu to repl dashboard r=xinhaoz a=kvoli In #96127 we added the option to load balance replica CPU instead of QPS across stores in a cluster. It is desirable to view the signal being controlled for rebalancing in the replication dashboard, similar to QPS. This pr adds the `rebalancing.cpunanospersecond` metric to the replication metrics dashboard. The avg QPS graph on the replication graph previously described the metric as "Exponentially weighted average", however this is not true. This pr updates the description to just be "moving average" which is accurate. Note that follow the workload does use an exponentially weighted value, however the metric in the dashboard is not the same. This pr also updates the graph header to include Replica in the title: "Average Replica Queries per Node". QPS is specific to replicas. This is already mentioned in the description. Resolves: #98109 98289: storage: mvccExportToWriter should always return buffered range keys r=adityamaru a=stevendanna In #96691, we changed the return type of mvccExportToWriter such that it now indicates when a CPU limit has been reached. As part of that change, when the CPU limit was reached, we would immedately `return` rather than `break`ing out of the loop. This introduced a bug, since the code after the loop that the `break` was taking us to is important. Namely, we may have previously buffered range keys that we need to write into our response still. By replacing the break with a return, these range keys were lost. The end-user impact of this is that a BACKUP that _ought_ to have included range keys (such as a backup of a table with a rolled back IMPORT) would not include those range keys and thus would end up resurecting deleted keys upon restore. This PR brings back the `break` and adds a test that covers exporting with range keys under CPU exhaustion. This bug only ever existed on master. Informs #97694 Epic: none Release note: None 98329: sql: fix iteration conditions in crdb_internal.scan r=ajwerner a=stevendanna Rather than using the Next() key of the last key in the response when iterating, we should use the resume span. The previous code could result in a failure in the rare case that the end key of our scan exactly matched the successor key of the very last key in the iteration. Epic: none Release note: None Co-authored-by: Evan Wall <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Steven Danna <[email protected]>
This patch allows the store rebalancer to use CPU in place of QPS when
balancing load on a cluster. This patch adds
cpu
as an option with thecluster setting:
kv.allocator.load_based_rebalancing.objective
When set to
cpu
, rather thanqps
. The store rebalancer will performa mostly identical function, however target balancing the sum of all
replica's cpu time on each store, rather than qps. The default remains
as
qps
here.Similar to QPS, the rebalance threshold can be set to allow controlling
the range above and below the mean store CPU is considered imbalanced,
either overfull or underfull respectively:
kv.allocator.cpu_rebalance_threshold
: 0.1In order to manage with mixed versions during upgrade and some
architectures not supporting the cpu sampling method, a rebalance
objective manager is introduced in
rebalance_objective.go
. The managermediates access to the rebalance objective and overwrites it in cases
where the objective set in the cluster setting cannot be supported.
The results when using CPU in comparison to QPS can be found here (internal).
Results Summary
Detailed Allocbench Results
resolves: #95380
Release note (ops change) Add option to balance cpu time (cpu)
instead of queries per second (qps) among stores in a cluster. This is
done by setting
kv.allocator.load_based_rebalancing.objective='cpu'
.kv.allocator.cpu_rebalance_threshold
is also added, similar tokv.allocator.qps_rebalance_threshold
to control the target range forstore cpu above and below the cluster mean.