Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: introduce cpu rebalancing #96127

Merged
merged 3 commits into from
Feb 9, 2023

Conversation

kvoli
Copy link
Collaborator

@kvoli kvoli commented Jan 27, 2023

This patch allows the store rebalancer to use CPU in place of QPS when
balancing load on a cluster. This patch adds cpu as an option with the
cluster setting:

kv.allocator.load_based_rebalancing.objective

When set to cpu, rather than qps. The store rebalancer will perform
a mostly identical function, however target balancing the sum of all
replica's cpu time on each store, rather than qps. The default remains
as qps here.

Similar to QPS, the rebalance threshold can be set to allow controlling
the range above and below the mean store CPU is considered imbalanced,
either overfull or underfull respectively:

kv.allocator.cpu_rebalance_threshold: 0.1

In order to manage with mixed versions during upgrade and some
architectures not supporting the cpu sampling method, a rebalance
objective manager is introduced in rebalance_objective.go. The manager
mediates access to the rebalance objective and overwrites it in cases
where the objective set in the cluster setting cannot be supported.

The results when using CPU in comparison to QPS can be found here (internal).

Results Summary

image

image

image

Detailed Allocbench Results
kv/r=0/access=skew
master
    median cost(gb):05.81 cpu(%):14.97 write(%):37.83
    stddev cost(gb):01.87 cpu(%):03.98 write(%):07.01
cpu rebalancing
    median cost(gb):08.76 cpu(%):14.42 write(%):36.61
    stddev cost(gb):02.66 cpu(%):01.85 write(%):04.80
kv/r=0/ops=skew
master
    median cost(gb):06.23 cpu(%):26.05 write(%):57.33
    stddev cost(gb):02.92 cpu(%):05.83 write(%):08.20
cpu rebalancing
    median cost(gb):04.28 cpu(%):11.45 write(%):31.28
    stddev cost(gb):02.25 cpu(%):02.51 write(%):06.68
kv/r=50/ops=skew
master
    median cost(gb):04.36 cpu(%):22.84 write(%):48.09
    stddev cost(gb):01.12 cpu(%):02.71 write(%):05.51
cpu rebalancing
    median cost(gb):04.64 cpu(%):13.49 write(%):43.05
    stddev cost(gb):01.07 cpu(%):01.26 write(%):08.58
kv/r=95/access=skew
master
    median cost(gb):00.00 cpu(%):09.51 write(%):01.24
    stddev cost(gb):00.00 cpu(%):01.74 write(%):00.27
cpu rebalancing
    median cost(gb):00.00 cpu(%):05.66 write(%):01.31
    stddev cost(gb):00.00 cpu(%):01.56 write(%):00.26
kv/r=95/ops=skew
master
    median cost(gb):0.00 cpu(%):47.29 write(%):00.93
    stddev cost(gb):0.09 cpu(%):04.30 write(%):00.17
cpu rebalancing
    median cost(gb):0.00 cpu(%):08.16 write(%):01.30
    stddev cost(gb):0.01 cpu(%):04.59 write(%):00.20

resolves: #95380

Release note (ops change) Add option to balance cpu time (cpu)
instead of queries per second (qps) among stores in a cluster. This is
done by setting kv.allocator.load_based_rebalancing.objective='cpu'.
kv.allocator.cpu_rebalance_threshold is also added, similar to
kv.allocator.qps_rebalance_threshold to control the target range for
store cpu above and below the cluster mean.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kvoli kvoli marked this pull request as ready for review January 27, 2023 22:24
@kvoli kvoli requested a review from a team as a code owner January 27, 2023 22:24
@kvoli kvoli self-assigned this Jan 27, 2023
@kvoli kvoli changed the title kvserver: introdue cpu rebalancing kvserver: introduce cpu rebalancing Jan 27, 2023
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 20 of 20 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @kvoli)


-- commits line 33 at r3:
", however, it will target"


-- commits line 51 at r3:
Release note (ops change): or this will trip up automation.


pkg/kv/kvserver/rebalance_objective.go line 32 at r3 (raw file):

// qps: which is the original default setting and looks at the number of batch
//
//	requests, on a range and store.

nit: something's off with this indentation.


pkg/kv/kvserver/rebalance_objective.go line 37 at r3 (raw file):

type LBRebalancingObjective int64

const (

This feels like the right place to add some discussion about the details of each of these approaches. For instance, we could discuss weighted batch requests for QPS-based rebalancing. We could also discuss how CPU is attributed to a given replica and how we decided between the two alternative approaches presented in https://docs.google.com/document/d/1_FJvTV0y3q0nOC-1xjfoyPZfo1yEV2oMriQek7LT7dg/edit#heading=h.g5iv0fx0qnlf. Also, you could mention the details of how multi-store rebalancing works in each case.

Let's not let all of your hard-earned learning be lost to the sands of time.


pkg/kv/kvserver/rebalance_objective.go line 68 at r3 (raw file):

// uniquely to a load.Dimension. However, in the future it is forseeable that
// LBRebalancingObjective could be a value that encompassese many different
// dimension within a single objective e.g. bytes written, cpu usage and

"dimensions"


pkg/kv/kvserver/rebalance_objective.go line 85 at r3 (raw file):

// of the cluster. It is possible that the cluster setting objective may not be
// the objective returned, when the cluster environment is unsupported or mixed
// mixed versions exist.

"mixed mixed"


pkg/kv/kvserver/rebalance_objective.go line 86 at r3 (raw file):

// the objective returned, when the cluster environment is unsupported or mixed
// mixed versions exist.
type RebalanceObjectiveProvider interface {

Does this type need to be exported? Same question for ResolveLBRebalancingObjective.


pkg/kv/kvserver/rebalance_objective.go line 92 at r3 (raw file):

// RebalanceObjectiveManager implements the RebalanceObjectiveProvider
// interface and registering a callback at creation time, that will be called

"registers"


pkg/kv/kvserver/rebalance_objective.go line 165 at r3 (raw file):

	// in such cases we cannot balance another objective since the data may not
	// exist. Fall back to QPS balancing.
	if !st.Version.IsActive(ctx, clusterversion.V23_1AllocatorCPUBalancing) {

Is there a risk of some nodes in the system recognizing that the cluster has been fully upgraded to v23.1 and others not? The propagation of this information is asynchronous and there's no bound on the delay. Do you have a sense of how bad the thrashing could be if two nodes in the system are using different objectives?


pkg/kv/kvserver/rebalance_objective.go line 170 at r3 (raw file):

	}
	// When the cpu timekeeping utility is unsupported on this aarch, the cpu
	// usage cannot be gathered. Fall back to QPS balancing.

What if another node in the system does not support grunning? Will that lead to different nodes using different objectives? We may not need to handle that case well, but we should make sure that we don't handle it catastrophically.


pkg/kv/kvserver/store.go line 1225 at r3 (raw file):

	s.rebalanceObjManager = newRebalanceObjectiveManager(ctx, s.cfg.Settings,
		func(ctx context.Context, obj LBRebalancingObjective) {
			s.VisitReplicas(func(r *Replica) bool {

nit: can we name the return arg so that it's clear what it signifies?


pkg/kv/kvserver/store_rebalancer.go line 268 at r3 (raw file):

			hottestRanges := sr.replicaRankings.TopLoad()
			options := sr.scorerOptions(ctx)
			rctx := sr.NewRebalanceContext(ctx, options, hottestRanges, sr.RebalanceMode())

mode?

RebalanceMode and RebalanceObjective are both volatile. They can change each time they are checked. Do we need to be any more deliberate than we already are about ensuring that a single pass through the allocator only acts on a single mode/objective?


pkg/kv/kvserver/allocator/range_usage_info.go line 45 at r2 (raw file):

}

// TransferImpact returns the impact of transferring the lease for the range,

"impact" is ambiguous. Is it implied here that the outgoing leaseholder will see a reduction in its load by the returned amount and the incoming leaseholder will see an increase in its load by the returned amount? If so, consider spelling that out in the comment.


pkg/kv/kvserver/allocator/allocatorimpl/threshold.go line 129 at r3 (raw file):

// SetAllDims returns a load vector with all dimensions filled in with the
// value given.
func SetAllDims(v float64) load.Load {

nit: WithAllDims might be a bit more reflective of what the function does. I would expect a function starting with Set to accept an object to set the value on. Feel free to ignore.


pkg/kv/kvserver/allocator/storepool/store_pool.go line 645 at r3 (raw file):

			if toDetail := sp.GetStoreDetailLocked(old.StoreID); toDetail != nil {
				toDetail.Desc.Capacity.RangeCount--
				toDetail.Desc.Capacity.CPUPerSecond -= rangeUsageInfo.RaftCPUNanosPerSecond

Should this saturate at 0 instead of going negative as well?

@kvoli kvoli force-pushed the 230120.cpu-balancing branch from c8cf548 to 178aa23 Compare February 7, 2023 20:51
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @nvanbenschoten)


-- commits line 33 at r3:

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

", however, it will target"

Updated.


-- commits line 51 at r3:

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Release note (ops change): or this will trip up automation.

Updated


pkg/kv/kvserver/rebalance_objective.go line 32 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: something's off with this indentation.

Updated


pkg/kv/kvserver/rebalance_objective.go line 37 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This feels like the right place to add some discussion about the details of each of these approaches. For instance, we could discuss weighted batch requests for QPS-based rebalancing. We could also discuss how CPU is attributed to a given replica and how we decided between the two alternative approaches presented in https://docs.google.com/document/d/1_FJvTV0y3q0nOC-1xjfoyPZfo1yEV2oMriQek7LT7dg/edit#heading=h.g5iv0fx0qnlf. Also, you could mention the details of how multi-store rebalancing works in each case.

Let's not let all of your hard-earned learning be lost to the sands of time.

Updated.


pkg/kv/kvserver/rebalance_objective.go line 68 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"dimensions"

Updated


pkg/kv/kvserver/rebalance_objective.go line 85 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"mixed mixed"

Updated


pkg/kv/kvserver/rebalance_objective.go line 86 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this type need to be exported? Same question for ResolveLBRebalancingObjective.

Exported for use in the simulator shortly. It is passed as an argument to create a new store rebalancer.


pkg/kv/kvserver/rebalance_objective.go line 92 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"registers"

Updated.


pkg/kv/kvserver/rebalance_objective.go line 165 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is there a risk of some nodes in the system recognizing that the cluster has been fully upgraded to v23.1 and others not? The propagation of this information is asynchronous and there's no bound on the delay. Do you have a sense of how bad the thrashing could be if two nodes in the system are using different objectives?

There's a risk, thrashing could occur between the new and old nodes if the different versions lasted >=2 the store rebalancer interval (1 min default).

Thrashing would only occur however if the values for the objectives are fairly different - like allocbench/nodes=7/cpu=8/kv/r=95/ops=skew. Even then, it's not guaranteed that thrashing occurs but likely.

Are there alternatives to this approach? I don't consider this a massive risk but curious if there's better approaches that eliminate the risk.


pkg/kv/kvserver/rebalance_objective.go line 170 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What if another node in the system does not support grunning? Will that lead to different nodes using different objectives? We may not need to handle that case well, but we should make sure that we don't handle it catastrophically.

It will lead to nodes using different objectives. However the impact is much worse, the stores on node's that don't support grunning will gossip CPU=0 and every other store will move load onto it.

I've reworked a few things to now gossip -1 CPU when the store is on a node that doesn't support cpu attribution. ResolveRebalancingObjectivenow checks the gossipped store descriptors and will revert to QPS if any of them have the CPUPerSecond set to -1. I want avoid adding another field to gossip if possible. Overloading the value itself does complicate things but seems logical given that it would otherwise be 0 and a negative value should never occur otherwise.


pkg/kv/kvserver/store.go line 1225 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: can we name the return arg so that it's clear what it signifies?

Updated.


pkg/kv/kvserver/store_rebalancer.go line 268 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

mode?

RebalanceMode and RebalanceObjective are both volatile. They can change each time they are checked. Do we need to be any more deliberate than we already are about ensuring that a single pass through the allocator only acts on a single mode/objective?

Updated this to only instantiate in the outer process loop, which feeds the argument to the scorer options and acts as the source of truth.


pkg/kv/kvserver/allocator/range_usage_info.go line 45 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"impact" is ambiguous. Is it implied here that the outgoing leaseholder will see a reduction in its load by the returned amount and the incoming leaseholder will see an increase in its load by the returned amount? If so, consider spelling that out in the comment.

Updated to spell this out.


pkg/kv/kvserver/allocator/allocatorimpl/threshold.go line 129 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: WithAllDims might be a bit more reflective of what the function does. I would expect a function starting with Set to accept an object to set the value on. Feel free to ignore.

I agree, updated.


pkg/kv/kvserver/allocator/storepool/store_pool.go line 645 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should this saturate at 0 instead of going negative as well?

This should saturate at 0 too. Updated.

@kvoli kvoli force-pushed the 230120.cpu-balancing branch from 178aa23 to d8bf342 Compare February 8, 2023 00:52
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r5, 20 of 20 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @kvoli)


pkg/kv/kvserver/rebalance_objective.go line 165 at r3 (raw file):

Previously, kvoli (Austen) wrote…

There's a risk, thrashing could occur between the new and old nodes if the different versions lasted >=2 the store rebalancer interval (1 min default).

Thrashing would only occur however if the values for the objectives are fairly different - like allocbench/nodes=7/cpu=8/kv/r=95/ops=skew. Even then, it's not guaranteed that thrashing occurs but likely.

Are there alternatives to this approach? I don't consider this a massive risk but curious if there's better approaches that eliminate the risk.

One approach that could be workable here is to include each node's rebalancing objective in its gossiped store descriptor or store capacity. Nodes would then only rebalance to other nodes that are still using the same rebalance objective. Without any synchronous coordination to verify the objective before rebalancing (think causality passing), this is still racy, but I think it would have less chance of thrashing. Once either node noticed that the other was using a different objective than it, it would stop rebalancing and wait for the objectives to converge. Or, if they never converged (see below with the grunning question), this would prevent rebalancing between incompatible nodes entirely.

EDIT: This is similar to what you're doing now with CPUPerSecond == -1. What you're doing there makes sense.


pkg/kv/kvserver/rebalance_objective.go line 52 at r6 (raw file):

	// impact of an action by using the QPS of the leaseholder replica invovled
	// e.g. the impact of lease transfers on the stores invovled is
	// +leaseholder replica QPS on the store that receives the lease and

The inclusion of the "replica" here but not in the next line looks intentional. Is it?


pkg/kv/kvserver/rebalance_objective.go line 64 at r6 (raw file):

	// LBRebalancingCPU is a rebalance objective that aims balances the store
	// CPU usage. The store CPU usage is calculated as the sum of replica's cpu

replicas'


pkg/kv/kvserver/rebalance_objective.go line 69 at r6 (raw file):

the behavior doesn't change

This could mean a few different things. Could you call out that we try to balance CPU across stores on the same node?


pkg/kv/kvserver/rebalance_objective.go line 77 at r6 (raw file):

and balance each stores' process usage

Consider adding something like ", using the measured replica cpu usage only to determine which replica to rebalance, but not when to rebalance or who to rebalance to". Or something along those lines to better contrast the two approaches.


pkg/kv/kvserver/rebalance_objective.go line 80 at r6 (raw file):

	// process usage. This approach benefits from observing the "true" cpu
	// usage, rather than just the sum of replica's usage. However, unlike the
	// implemented approach, the estmated impact of actions was less reliable

"estimated"


pkg/kv/kvserver/store.go line 2612 at r6 (raw file):

	// balancing objective is permitted. If this is not updated, the cpu per
	// second will be zero and other stores will likely begin rebalancing
	// towards this store as it will appear underfull.

This is probably not worth it, but if there is a concern about forgetting that -1 is a sentinel value in some code paths and rebalancing towards a node with !grunning.Supported(), we could also make math.MaxInt64 the sentinel value.


pkg/kv/kvserver/allocator/storepool/store_pool.go line 691 at r6 (raw file):

		// When CPU attribution is unsupported, the store will set the
		// CPUPerSecond of its store capacity to be -1.
		if fromDetail.Desc.Capacity.CPUPerSecond >= 0 {

These all feel a bit error-prone, but I don't have a good suggestion for how to make it harder to forget to check this.

@kvoli kvoli force-pushed the 230120.cpu-balancing branch from d8bf342 to 0b0dfd5 Compare February 8, 2023 13:56
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @nvanbenschoten)


pkg/kv/kvserver/rebalance_objective.go line 52 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The inclusion of the "replica" here but not in the next line looks intentional. Is it?

Somewhat intentional but now that I think about it, it doesn't seem right to make any distinction. Updated to be leaseholder replica for both.


pkg/kv/kvserver/rebalance_objective.go line 64 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

replicas'

updated


pkg/kv/kvserver/rebalance_objective.go line 69 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

the behavior doesn't change

This could mean a few different things. Could you call out that we try to balance CPU across stores on the same node?

Updated with an example.


pkg/kv/kvserver/rebalance_objective.go line 77 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

and balance each stores' process usage

Consider adding something like ", using the measured replica cpu usage only to determine which replica to rebalance, but not when to rebalance or who to rebalance to". Or something along those lines to better contrast the two approaches.

Added


pkg/kv/kvserver/rebalance_objective.go line 80 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"estimated"

Updated


pkg/kv/kvserver/store.go line 2612 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is probably not worth it, but if there is a concern about forgetting that -1 is a sentinel value in some code paths and rebalancing towards a node with !grunning.Supported(), we could also make math.MaxInt64 the sentinel value.

I prefer the negative value because it feels like a pattern we could follow if we bake it into the store pool updates - see below comment.


pkg/kv/kvserver/allocator/storepool/store_pool.go line 691 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

These all feel a bit error-prone, but I don't have a good suggestion for how to make it harder to forget to check this.

There's probably some work that could go into the storepool either this release or early next release to make this easier to handle. Something akin to "virtualizing" the store descriptors and keeping the underlying gossiped descriptor as a physical value of sorts.

related issues:
#89395
#93532

Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @nvanbenschoten)


pkg/kv/kvserver/rebalance_objective.go line 186 at r9 (raw file):

// Objective returns the current rebalance objective.
func (rom *RebalanceObjectiveManager) Objective(ctx context.Context) LBRebalancingObjective {
	rom.maybeUpdateRebalanceObjective(ctx)

I added this maybeUpdateRebalanceObjective so that the storepool would be checked for CPU=-1. The check is now more expensive. The check being more expensive isn't an issue for CPU balancing, since it is only checked when calling Capacity and every time the store rebalancer wakes up. The check being more expensive is an issue for CPU load splitting as that checks the RebalanceObjective more frequently, including every request to the replica.

I'm going to change the design of how we get the information. I'll ping when I've updated.

@kvoli kvoli force-pushed the 230120.cpu-balancing branch 3 times, most recently from c9db908 to 66d42dd Compare February 8, 2023 18:48
@kvoli kvoli requested a review from nvanbenschoten February 8, 2023 18:51
@kvoli kvoli force-pushed the 230120.cpu-balancing branch 4 times, most recently from a4e30d4 to 130c1d5 Compare February 9, 2023 00:31
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 3 files at r11, 22 of 22 files at r12, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist and @kvoli)


pkg/kv/kvserver/rebalance_objective.go line 69 at r6 (raw file):

Previously, kvoli (Austen) wrote…

Updated with an example.

Thanks, the added example is helpful.


pkg/kv/kvserver/rebalance_objective.go line 202 at r12 (raw file):

	capacityChangeNotifier.SetOnCapacityChange(
		func(storeID roachpb.StoreID, old, cur roachpb.StoreCapacity) {
			if old.CPUPerSecond < 0 && cur.CPUPerSecond >= 0 ||

This might be easier to understand if written as if (old.CPUPerSecond < 0) != (new.CPUPerSecond < 0) {

@kvoli kvoli force-pushed the 230120.cpu-balancing branch from 130c1d5 to a65f21f Compare February 9, 2023 13:56
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andrewbaptist and @nvanbenschoten)


pkg/kv/kvserver/rebalance_objective.go line 202 at r12 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This might be easier to understand if written as if (old.CPUPerSecond < 0) != (new.CPUPerSecond < 0) {

That is easier to understand than what I had, updated to this.

kvoli added 2 commits February 9, 2023 15:11
Previously `Queries` was hardcoded as the dimension to use when creating
test ranges for use in store rebalancer tests. This patch enables
passing in any `dimension`.

Release note: None
Previously, when estimating the impact a lease transfer would have, we
would not differentiate between rebalance/transfer. This commit adds a
utility method `TransferImpact` to `RangeUsageInfo` which is now used
when making estimations about lease transers.

Currently, this is identical to `Load`, which was previously used
instead for transfers.

Release note: None
This patch allows the store rebalancer to use CPU in place of QPS when
balancing load on a cluster. This patch adds `cpu` as an option with the
cluster setting:

`kv.allocator.load_based_rebalancing.objective`

When set to `cpu`, rather than `qps`. The store rebalancer will perform
a mostly identical function, however, it will target balancing the sum
of all replica's cpu time on each store, rather than qps. The default
remains as `qps` here.

Similar to QPS, the rebalance threshold can be set to allow controlling
the range above and below the mean store CPU is considered imbalanced,
either overfull or underfull respectively:

`kv.allocator.cpu_rebalance_threshold`: 0.1

In order to manage with mixed versions during upgrade and some
architectures not supporting the cpu sampling method, a rebalance
objective manager is introduced in `rebalance_objective.go`. The manager
mediates access to the rebalance objective and overwrites it in cases
where the objective set in the cluster setting cannot be supported.

resolves: cockroachdb#95380

Release note (ops change): Add option to balance cpu time (cpu)
instead of queries per second (qps) among stores in a cluster. This is
done by setting `kv.allocator.load_based_rebalancing.objective='cpu'`.
`kv.allocator.cpu_rebalance_threshold` is also added, similar to
`kv.allocator.qps_rebalance_threshold` to control the target range for
store cpu above and below the cluster mean.
@kvoli kvoli force-pushed the 230120.cpu-balancing branch from a65f21f to c28ed6b Compare February 9, 2023 15:11
@kvoli
Copy link
Collaborator Author

kvoli commented Feb 9, 2023

TYFTR

@kvoli
Copy link
Collaborator Author

kvoli commented Feb 9, 2023

bors r=nvanbenschoten

@craig craig bot merged commit 732c3a7 into cockroachdb:master Feb 9, 2023
@craig
Copy link
Contributor

craig bot commented Feb 9, 2023

Build succeeded:

craig bot pushed a commit that referenced this pull request Feb 10, 2023
96128: kvserver: introduce cpu load based splits r=nvanbenschoten a=kvoli

This PR adds the ability to perform load based splitting with replica
cpu usage rather than queries per second. Load based splitting now will
use either cpu or qps for deciding split points, depending on the
cluster setting `kv.allocator.load_based_rebalancing.objective`.

When set to `qps`, qps is used in deciding split points and when
splitting should occur; similarly, `cpu` means that request cpu against
the leaseholder replica is to decide split points.

The split threshold when using `cpu` is the cluster setting
`kv.range_split.load_cpu_threshold` which defaults to `250ms` of cpu
time per second, i.e. a replica using 1/4 processor of a machine
consistently.

The merge queue uses the load based splitter to make decisions on
whether to merge two adjacent ranges due to low load. This commit also
updates the merge queue to be consistent with the load based splitter
signal. When switching between `qps` and `cpu`, the load based splitter
for every replica is reset to avoid spurious results.

![image](https://user-images.githubusercontent.com/39606633/215580650-b12ff509-5cf5-4ffa-880d-8387e2ef0afa.png)


resolves: #95377
resolves: #83519
depends on: #96127

Release note (ops change): Load based splitter now supports using request
cpu usage to split ranges. This is introduced with the previous cluster
setting `kv.allocator.load_based_rebalancing.objective`, which when set
to `cpu`, will use request cpu usage. The threshold above which
CPU usage of a range is considered for splitting is defined in the
cluster setting `kv.range_split.load_cpu_threshold`, which has a default
value of `250ms`.

96129: bazel: is_dev_linux config setting should be false when cross_flag is r=rickystewart a=srosenberg

true

When cross-compiling on a gce worker and invoking bazel directly, build fails with "cannot find -lresolv_wrapper", owing to the fact that is_dev_linux is not mutually exclusive of cross_linux. That is, when both configs are active "-lresolv_wrapper" is passed to clinkopts in pkg/ccl/gssapiccl/BUILD.bazel; the cross-compiler doesn't have this lib nor does it need it.

Note, when using ./dev instead of bazel, the above issue is side-stepped because the dev wrapper invokes bazel inside docker which ignores ~/.bazelrc.

The workaround is to make is_dev_linux false when cross_flag is true.

Epic: none

Release note: None

96431: ui: databases shows partial results for size limit error r=j82w a=j82w

The databases page displays partial results instead of
just showing an error message.

Sorting is disabled if there are more than 2 pages of results
which is currently configured to 40dbs. This still allows most 
user to use sort functionality, but prevents large customers
from breaking when it would need to do a network call per a
database.

The database details are now loaded on demand for the first
page only. Previously a network call was done for all databases 
which would result in 2k network calls. It now only loads the page
of details the user is looking at. 

part of: #94333

https://www.loom.com/share/31b213b2f1764d0f9868bd967b9388b8

Release note: none

Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Stan Rosenberg <[email protected]>
Co-authored-by: j82w <[email protected]>
kvoli added a commit to kvoli/cockroach that referenced this pull request Mar 9, 2023
In cockroachdb#96127 we added the option to load balance replica CPU instead of QPS
across stores in a cluster. It is desirable to view the signal being
controlled for rebalancing in the replication dashboard, similar to QPS.

This commit adds the `rebalancing.cpunanospersecond` metric to the
replication metrics dashboard.

Resolves: cockroachdb#98109

Release note (ui change): `rebalancing.cpunanospersecond` is now
included in the replication metrics dashboard.
kvoli added a commit to kvoli/cockroach that referenced this pull request Mar 9, 2023
In cockroachdb#96127 we added the option to load balance replica CPU instead of QPS
across stores in a cluster. It is desirable to view the signal being
controlled for rebalancing in the replication dashboard, similar to QPS.

This commit adds the `rebalancing.cpunanospersecond` metric to the
replication metrics dashboard.

Resolves: cockroachdb#98109

Release note (ui change): `rebalancing.cpunanospersecond` is now
included in the replication metrics dashboard.
craig bot pushed a commit that referenced this pull request Mar 9, 2023
97717: multitenant: add AdminUnsplitRequest capability r=knz a=ecwall

Fixes #97716

1) Add a new `tenantcapabilitiespb.CanAdminUnsplit` capability.
2) Check capability in `Authorizer.HasCapabilityForBatch`.
3) Remove `ExecutorConfig.RequireSystemTenant` call from
   `execFactory.ConstructAlterTableUnsplit`,
   `execFactory.ConstructAlterTableUnsplitAll`.

Release note: None


98250: kvserver: add minimum cpu lb split threshold r=andrewbaptist a=kvoli

Previously, `kv.range_split.load_cpu_threshold` had no minimum setting value. It is undesirable to allow users to set this setting to low as excessive splitting may occur.

`kv.range_split.load_cpu_threshold` now has a minimum setting value of `10ms`.

See #96869 for additional context on the threshold.

Resolves: #98107

Release note (ops change): `kv.range_split.load_cpu_threshold` now has a minimum setting value of `10ms`.

98270: dashboards: add replica cpu to repl dashboard r=xinhaoz a=kvoli

In #96127 we added the option to load balance replica CPU instead of QPS across
stores in a cluster. It is desirable to view the signal being controlled for
rebalancing in the replication dashboard, similar to QPS.

This pr adds the `rebalancing.cpunanospersecond` metric to the replication
metrics dashboard.

The avg QPS graph on the replication graph previously described the metric as
"Exponentially weighted average", however this is not true.

This pr updates the description to just be "moving average" which is accurate.
Note that follow the workload does use an exponentially weighted value, however
the metric in the dashboard is not the same.

This pr also updates the graph header to include Replica in the title: "Average
Replica Queries per Node". QPS is specific to replicas. This is already
mentioned in the description.

Resolves: #98109


98289: storage: mvccExportToWriter should always return buffered range keys r=adityamaru a=stevendanna

In #96691, we changed the return type of mvccExportToWriter such that it now indicates when a CPU limit has been reached. As part of that change, when the CPU limit was reached, we would immedately `return` rather than `break`ing out of the loop. This introduced a bug, since the code after the loop that the `break` was taking us to is important. Namely, we may have previously buffered range keys that we need to write into our response still. By replacing the break with a return, these range keys were lost.

The end-user impact of this is that a BACKUP that _ought_ to have included range keys (such as a backup of a table with a rolled back IMPORT) would not include those range keys and thus would end up resurecting deleted keys upon restore.

This PR brings back the `break` and adds a test that covers exporting with range keys under CPU exhaustion.

This bug only ever existed on master.

Informs #97694

Epic: none

Release note: None

98329: sql: fix iteration conditions in crdb_internal.scan r=ajwerner a=stevendanna

Rather than using the Next() key of the last key in the response when iterating, we should use the resume span. The previous code could result in a failure in the rare case that the end key of our scan exactly matched the successor key of the very last key in the iteration.

Epic: none

Release note: None

Co-authored-by: Evan Wall <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Steven Danna <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: store cpu rebalancing
3 participants