Merge cockroachdb#96031 cockroachdb#96127 cockroachdb#96440 cockroach…

…db#96828 cockroachdb#96870 cockroachdb#96874 cockroachdb#96883 96031: sql: add mixed version test for system.role_members user ids upgrade r=rafiss a=andyyang890 This patch adds a mixed version logictest that ensures that GRANT ROLE continues to work properly in a cluster with both 22.2 and 23.1 nodes (i.e. nodes that have run the system.role_members user ids upgrade). Part of cockroachdb#92342 Release note: None 96127: kvserver: introduce cpu rebalancing r=nvanbenschoten a=kvoli This patch allows the store rebalancer to use CPU in place of QPS when balancing load on a cluster. This patch adds `cpu` as an option with the cluster setting: `kv.allocator.load_based_rebalancing.objective` When set to `cpu`, rather than `qps`. The store rebalancer will perform a mostly identical function, however target balancing the sum of all replica's cpu time on each store, rather than qps. The default remains as `qps` here. Similar to QPS, the rebalance threshold can be set to allow controlling the range above and below the mean store CPU is considered imbalanced, either overfull or underfull respectively: `kv.allocator.cpu_rebalance_threshold`: 0.1 In order to manage with mixed versions during upgrade and some architectures not supporting the cpu sampling method, a rebalance objective manager is introduced in `rebalance_objective.go`. The manager mediates access to the rebalance objective and overwrites it in cases where the objective set in the cluster setting cannot be supported. The results when using CPU in comparison to QPS can be found [here](https://docs.google.com/document/d/1QLhD20BTamjj3-dSG9F1gW7XMBy9miGPpJpmu2Dn3yo/edit#) (internal). <details> <summary>Results Summary</summary> ![image](https://user-images.githubusercontent.com/39606633/215580650-b12ff509-5cf5-4ffa-880d-8387e2ef0afa.png) ![image](https://user-images.githubusercontent.com/39606633/215580626-3d748ba1-e9a4-4abb-8acd-2c319203932e.png) ![image](https://user-images.githubusercontent.com/39606633/215580585-58e6000d-b6cf-430a-b4b7-d14a77eab3bd.png) </details> <details> <summary>Detailed Allocbench Results</summary> ``` kv/r=0/access=skew master median cost(gb):05.81 cpu(%):14.97 write(%):37.83 stddev cost(gb):01.87 cpu(%):03.98 write(%):07.01 cpu rebalancing median cost(gb):08.76 cpu(%):14.42 write(%):36.61 stddev cost(gb):02.66 cpu(%):01.85 write(%):04.80 kv/r=0/ops=skew master median cost(gb):06.23 cpu(%):26.05 write(%):57.33 stddev cost(gb):02.92 cpu(%):05.83 write(%):08.20 cpu rebalancing median cost(gb):04.28 cpu(%):11.45 write(%):31.28 stddev cost(gb):02.25 cpu(%):02.51 write(%):06.68 kv/r=50/ops=skew master median cost(gb):04.36 cpu(%):22.84 write(%):48.09 stddev cost(gb):01.12 cpu(%):02.71 write(%):05.51 cpu rebalancing median cost(gb):04.64 cpu(%):13.49 write(%):43.05 stddev cost(gb):01.07 cpu(%):01.26 write(%):08.58 kv/r=95/access=skew master median cost(gb):00.00 cpu(%):09.51 write(%):01.24 stddev cost(gb):00.00 cpu(%):01.74 write(%):00.27 cpu rebalancing median cost(gb):00.00 cpu(%):05.66 write(%):01.31 stddev cost(gb):00.00 cpu(%):01.56 write(%):00.26 kv/r=95/ops=skew master median cost(gb):0.00 cpu(%):47.29 write(%):00.93 stddev cost(gb):0.09 cpu(%):04.30 write(%):00.17 cpu rebalancing median cost(gb):0.00 cpu(%):08.16 write(%):01.30 stddev cost(gb):0.01 cpu(%):04.59 write(%):00.20 ``` </details> resolves: cockroachdb#95380 Release note (ops change) Add option to balance cpu time (cpu) instead of queries per second (qps) among stores in a cluster. This is done by setting `kv.allocator.load_based_rebalancing.objective='cpu'`. `kv.allocator.cpu_rebalance_threshold` is also added, similar to `kv.allocator.qps_rebalance_threshold` to control the target range for store cpu above and below the cluster mean. 96440: ui: add execution insights to statement and transaction fingerprint details r=ericharmeling a=ericharmeling This commit adds execution insights to the Statement Fingerprint and Transaction Fingerprint Details pages. Part of cockroachdb#83780. Loom: https://www.loom.com/share/98d2023b672e43fa8016829aa641a829 Note that the SQL queries against the `*_execution_insights` tables are updated to `SELECT DISTINCT ON (*_fingerprint_id, problems, causes)` (equivalent to `GROUP BY (*_fingerprint_id, problems, causes)`) from the latest results in the tables, rather than `row_number() OVER ( PARTITION BY stmt_fingerprint_id, problem, causes ORDER BY end_time DESC ) AS rank... WHERE rank = 1`. Both patterns return the same result, but one uses aggregation and the other uses a window function. I find the `DISTINCT ON/GROUP BY` pattern easier to understand, I'm not seeing much difference in the planning/execution time between the two over the same set of data, and I'm seeing `DISTINCT ON/GROUP BY` coming up as more performant in almost all the secondary sources I've encountered. Release note (ui change): Added execution insights to the Statement Fingerprint Details and Transaction Fingerprint Details Pages. 96828: collatedstring: support default, C, and POSIX in expressions r=otan a=rafiss fixes cockroachdb#50734 fixes cockroachdb#95667 informs cockroachdb#57255 --- ### collatedstring: create new package Move the small amount of code from tree/collatedstring.go --- ### collatedstring: support C and POSIX in expressions Release note (sql change): Expressions of the form `COLLATE "default"`, `COLLATE "C"`, and `COLLATE "POSIX"` are now supported. Since the default collation cannot be changed currently, these expressions are all equivalent. The expressions are evaluated by treating the input as a normal string, and ignoring the collation. This means that comparisons between strings and collated strings that use "default", "C", or "POSIX" are now supported. Creating a column with the "C" or "POSIX" collations is still not supported. 96870: kvserver: use replicasByKey addition func in snapshot path r=tbg a=pavelkalinnikov This commit makes one step towards better code sharing between `Replica` initialization paths: split trigger and snapshot application. It makes both to use the same method to check and insert the initialized `Replica` to `replicasByKey` map. Touches cockroachdb#94912 96874: roachtest: run scheduled backup only on clusters with enterprise license r=stevendanna a=msbutler Epic: none Release note: None 96883: go.mod: bump Pebble to 829675f94811 r=RaduBerinde a=RaduBerinde 829675f9 db: fix ObsoleteSize stat 2f086b74 db: refactor compaction splitting to reduce key comparisons Release note: None Epic: none Co-authored-by: Andy Yang <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Eric Harmeling <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]> Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Radu Berinde <[email protected]>
RaduBerinde · Feb 9, 2023 · 732c3a7 · 732c3a7
8 parents a439273 + 7cb807e + c28ed6b + 2ec730f + a7e9c4a + f5bbd77 + 399a278 + e95cc41
commit 732c3a7
Show file tree

Hide file tree

Showing 87 changed files with 2,218 additions and 498 deletions.
diff --git a/DEPS.bzl b/DEPS.bzl
@@ -1525,10 +1525,10 @@ def go_deps():
         patches = [
             "@com_github_cockroachdb_cockroach//build/patches:com_github_cockroachdb_pebble.patch",
         ],
-        sha256 = "7087b386f9b4da9ce24708ae5b26eb175655892045b92c30952ba353b47d42aa",
-        strip_prefix = "github.com/cockroachdb/[email protected]20230208205550-65fa048bf403",
+        sha256 = "b6c2765d18f70c6ebbb0f1ff1d9a5d1b453a4bd5c57b8dd3a6625d7c4e28d63d",
+        strip_prefix = "github.com/cockroachdb/[email protected]20230209160836-829675f94811",
         urls = [
-            "https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20230208205550-65fa048bf403.zip",
+            "https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20230209160836-829675f94811.zip",
         ],
     )
     go_repository(

diff --git a/build/bazelutil/distdir_files.bzl b/build/bazelutil/distdir_files.bzl
@@ -197,7 +197,7 @@ DISTDIR_FILES = {
     "https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/google-api-go-client/com_github_cockroachdb_google_api_go_client-v0.80.1-0.20221117193156-6a9f7150cb93.zip": "b3378c579f4f4340403038305907d672c86f615f8233118a8873ebe4229c4f39",
     "https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/gostdlib/com_github_cockroachdb_gostdlib-v1.19.0.zip": "c4d516bcfe8c07b6fc09b8a9a07a95065b36c2855627cb3514e40c98f872b69e",
     "https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/logtags/com_github_cockroachdb_logtags-v0.0.0-20230118201751-21c54148d20b.zip": "ca7776f47e5fecb4c495490a679036bfc29d95bd7625290cfdb9abb0baf97476",
-    "https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20230208205550-65fa048bf403.zip": "7087b386f9b4da9ce24708ae5b26eb175655892045b92c30952ba353b47d42aa",
+    "https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20230209160836-829675f94811.zip": "b6c2765d18f70c6ebbb0f1ff1d9a5d1b453a4bd5c57b8dd3a6625d7c4e28d63d",
     "https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/redact/com_github_cockroachdb_redact-v1.1.3.zip": "7778b1e4485e4f17f35e5e592d87eb99c29e173ac9507801d000ad76dd0c261e",
     "https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/returncheck/com_github_cockroachdb_returncheck-v0.0.0-20200612231554-92cdbca611dd.zip": "ce92ba4352deec995b1f2eecf16eba7f5d51f5aa245a1c362dfe24c83d31f82b",
     "https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/sentry-go/com_github_cockroachdb_sentry_go-v0.6.1-cockroachdb.2.zip": "fbb2207d02aecfdd411b1357efe1192dbb827959e36b7cab7491731ac55935c9",

diff --git a/docs/generated/settings/settings-for-tenants.txt b/docs/generated/settings/settings-for-tenants.txt
@@ -295,4 +295,4 @@ trace.jaeger.agent	string		the address of a Jaeger agent to receive traces using
 trace.opentelemetry.collector	string		address of an OpenTelemetry trace collector to receive traces using the otel gRPC protocol, as <host>:<port>. If no port is specified, 4317 will be used.
 trace.span_registry.enabled	boolean	true	if set, ongoing traces can be seen at https://<ui>/#/debug/tracez
 trace.zipkin.collector	string		the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used.
-version	version	1000022.2-38	set the active cluster version in the format '<major>.<minor>'
+version	version	1000022.2-40	set the active cluster version in the format '<major>.<minor>'
diff --git a/docs/generated/settings/settings.html b/docs/generated/settings/settings.html
@@ -45,9 +45,11 @@
 <tr><td><div id="setting-jobs-retention-time" class="anchored"><code>jobs.retention_time</code></div></td><td>duration</td><td><code>336h0m0s</code></td><td>the amount of time to retain records for completed jobs before</td></tr>
 <tr><td><div id="setting-kv-allocator-load-based-lease-rebalancing-enabled" class="anchored"><code>kv.allocator.load_based_lease_rebalancing.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>set to enable rebalancing of range leases based on load and latency</td></tr>
 <tr><td><div id="setting-kv-allocator-load-based-rebalancing" class="anchored"><code>kv.allocator.load_based_rebalancing</code></div></td><td>enumeration</td><td><code>leases and replicas</code></td><td>whether to rebalance based on the distribution of load across stores [off = 0, leases = 1, leases and replicas = 2]</td></tr>
+<tr><td><div id="setting-kv-allocator-load-based-rebalancing-objective" class="anchored"><code>kv.allocator.load_based_rebalancing.objective</code></div></td><td>enumeration</td><td><code>qps</code></td><td>what objective does the cluster use to rebalance; if set to `qps` the cluster will attempt to balance qps among stores, if set to `cpu` the cluster will attempt to balance cpu usage among stores [qps = 0, cpu = 1]</td></tr>
 <tr><td><div id="setting-kv-allocator-load-based-rebalancing-interval" class="anchored"><code>kv.allocator.load_based_rebalancing_interval</code></div></td><td>duration</td><td><code>1m0s</code></td><td>the rough interval at which each store will check for load-based lease / replica rebalancing opportunities</td></tr>
 <tr><td><div id="setting-kv-allocator-qps-rebalance-threshold" class="anchored"><code>kv.allocator.qps_rebalance_threshold</code></div></td><td>float</td><td><code>0.1</code></td><td>minimum fraction away from the mean a store&#39;s QPS (such as queries per second) can be before it is considered overfull or underfull</td></tr>
 <tr><td><div id="setting-kv-allocator-range-rebalance-threshold" class="anchored"><code>kv.allocator.range_rebalance_threshold</code></div></td><td>float</td><td><code>0.05</code></td><td>minimum fraction away from the mean a store&#39;s range count can be before it is considered overfull or underfull</td></tr>
+<tr><td><div id="setting-kv-allocator-store-cpu-rebalance-threshold" class="anchored"><code>kv.allocator.store_cpu_rebalance_threshold</code></div></td><td>float</td><td><code>0.1</code></td><td>minimum fraction away from the mean a store&#39;s cpu usage can be before it is considered overfull or underfull</td></tr>
 <tr><td><div id="setting-kv-bulk-io-write-max-rate" class="anchored"><code>kv.bulk_io_write.max_rate</code></div></td><td>byte size</td><td><code>1.0 TiB</code></td><td>the rate limit (bytes/sec) to use for writes to disk on behalf of bulk io ops</td></tr>
 <tr><td><div id="setting-kv-bulk-sst-max-allowed-overage" class="anchored"><code>kv.bulk_sst.max_allowed_overage</code></div></td><td>byte size</td><td><code>64 MiB</code></td><td>if positive, allowed size in excess of target size for SSTs from export requests; export requests (i.e. BACKUP) may buffer up to the sum of kv.bulk_sst.target_size and kv.bulk_sst.max_allowed_overage in memory</td></tr>
 <tr><td><div id="setting-kv-bulk-sst-target-size" class="anchored"><code>kv.bulk_sst.target_size</code></div></td><td>byte size</td><td><code>16 MiB</code></td><td>target size for SSTs emitted from export requests; export requests (i.e. BACKUP) may buffer up to the sum of kv.bulk_sst.target_size and kv.bulk_sst.max_allowed_overage in memory</td></tr>
@@ -236,6 +238,6 @@
 <tr><td><div id="setting-trace-opentelemetry-collector" class="anchored"><code>trace.opentelemetry.collector</code></div></td><td>string</td><td><code></code></td><td>address of an OpenTelemetry trace collector to receive traces using the otel gRPC protocol, as &lt;host&gt;:&lt;port&gt;. If no port is specified, 4317 will be used.</td></tr>
 <tr><td><div id="setting-trace-span-registry-enabled" class="anchored"><code>trace.span_registry.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>if set, ongoing traces can be seen at https://&lt;ui&gt;/#/debug/tracez</td></tr>
 <tr><td><div id="setting-trace-zipkin-collector" class="anchored"><code>trace.zipkin.collector</code></div></td><td>string</td><td><code></code></td><td>the address of a Zipkin instance to receive traces, as &lt;host&gt;:&lt;port&gt;. If no port is specified, 9411 will be used.</td></tr>
-<tr><td><div id="setting-version" class="anchored"><code>version</code></div></td><td>version</td><td><code>1000022.2-38</code></td><td>set the active cluster version in the format &#39;&lt;major&gt;.&lt;minor&gt;&#39;</td></tr>
+<tr><td><div id="setting-version" class="anchored"><code>version</code></div></td><td>version</td><td><code>1000022.2-40</code></td><td>set the active cluster version in the format &#39;&lt;major&gt;.&lt;minor&gt;&#39;</td></tr>
 </tbody>
 </table>
diff --git a/go.mod b/go.mod
@@ -114,7 +114,7 @@ require (
 	github.com/cockroachdb/go-test-teamcity v0.0.0-20191211140407-cff980ad0a55
 	github.com/cockroachdb/gostdlib v1.19.0
 	github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b
-	github.com/cockroachdb/pebble v0.0.0-20230208205550-65fa048bf403
+	github.com/cockroachdb/pebble v0.0.0-20230209160836-829675f94811
 	github.com/cockroachdb/redact v1.1.3
 	github.com/cockroachdb/returncheck v0.0.0-20200612231554-92cdbca611dd
 	github.com/cockroachdb/stress v0.0.0-20220803192808-1806698b1b7b

diff --git a/go.sum b/go.sum
@@ -483,8 +483,8 @@ github.com/cockroachdb/gostdlib v1.19.0/go.mod h1:+dqqpARXbE/gRDEhCak6dm0l14AaTy
 github.com/cockroachdb/logtags v0.0.0-20211118104740-dabe8e521a4f/go.mod h1:Vz9DsVWQQhf3vs21MhPMZpMGSht7O/2vFW2xusFUVOs=
 github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b h1:r6VH0faHjZeQy818SGhaone5OnYfxFR/+AzdY3sf5aE=
 github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b/go.mod h1:Vz9DsVWQQhf3vs21MhPMZpMGSht7O/2vFW2xusFUVOs=
-github.com/cockroachdb/pebble v0.0.0-20230208205550-65fa048bf403 h1:48ODta9RgBfr4nfFG/gwfAcP4MNWDuSR1rOQ3hPefFo=
-github.com/cockroachdb/pebble v0.0.0-20230208205550-65fa048bf403/go.mod h1:Nb5lgvnQ2+oGlE/EyZy4+2/CxRh9KfvCXnag1vtpxVM=
+github.com/cockroachdb/pebble v0.0.0-20230209160836-829675f94811 h1:ytcWPaNPhNoGMWEhDvS3zToKcDpRsLuRolQJBVGdozk=
+github.com/cockroachdb/pebble v0.0.0-20230209160836-829675f94811/go.mod h1:Nb5lgvnQ2+oGlE/EyZy4+2/CxRh9KfvCXnag1vtpxVM=
 github.com/cockroachdb/redact v1.1.3 h1:AKZds10rFSIj7qADf0g46UixK8NNLwWTNdCIGS5wfSQ=
 github.com/cockroachdb/redact v1.1.3/go.mod h1:BVNblN9mBWFyMyqK1k3AAiSxhvhfK2oOZZ2lK+dpvRg=
 github.com/cockroachdb/returncheck v0.0.0-20200612231554-92cdbca611dd h1:KFOt5I9nEKZgCnOSmy8r4Oykh8BYQO8bFOTgHDS8YZA=

diff --git a/pkg/BUILD.bazel b/pkg/BUILD.bazel
@@ -2048,6 +2048,7 @@ GO_TARGETS = [
     "//pkg/util/circuit:circuit_test",
     "//pkg/util/cloudinfo:cloudinfo",
     "//pkg/util/cloudinfo:cloudinfo_test",
+    "//pkg/util/collatedstring:collatedstring",
     "//pkg/util/contextutil:contextutil",
     "//pkg/util/contextutil:contextutil_test",
     "//pkg/util/ctxgroup:ctxgroup",
@@ -3143,6 +3144,7 @@ GET_X_DATA_TARGETS = [
     "//pkg/util/cgroups:get_x_data",
     "//pkg/util/circuit:get_x_data",
     "//pkg/util/cloudinfo:get_x_data",
+    "//pkg/util/collatedstring:get_x_data",
     "//pkg/util/contextutil:get_x_data",
     "//pkg/util/ctxgroup:get_x_data",
     "//pkg/util/duration:get_x_data",

diff --git a/pkg/clusterversion/cockroach_versions.go b/pkg/clusterversion/cockroach_versions.go
@@ -411,6 +411,10 @@ const (
 	// responsible for polling the jobs table for metrics.
 	V23_1_CreateJobsMetricsPollingJob
 
+	// V23_1AllocatorCPUBalancing adds balancing CPU usage among stores using
+	// the allocator and store rebalancer. It assumes that at this version,
+	// stores now include their CPU in the StoreCapacity proto when gossiping.
+	V23_1AllocatorCPUBalancing
 	// *************************************************
 	// Step (1): Add new versions here.
 	// Do not add new versions to a patch release.
@@ -708,7 +712,10 @@ var rawVersionsSingleton = keyedVersions{
 		Key:     V23_1_CreateJobsMetricsPollingJob,
 		Version: roachpb.Version{Major: 22, Minor: 2, Internal: 38},
 	},
-
+	{
+		Key:     V23_1AllocatorCPUBalancing,
+		Version: roachpb.Version{Major: 22, Minor: 2, Internal: 40},
+	},
 	// *************************************************
 	// Step (2): Add new versions here.
 	// Do not add new versions to a patch release.

diff --git a/pkg/kv/kvserver/BUILD.bazel b/pkg/kv/kvserver/BUILD.bazel
@@ -29,6 +29,7 @@ go_library(
         "raft_transport_metrics.go",
         "raft_truncator_replica.go",
         "range_log.go",
+        "rebalance_objective.go",
         "replica.go",
         "replica_app_batch.go",
         "replica_application_cmd.go",
@@ -273,6 +274,7 @@ go_test(
         "raft_transport_test.go",
         "raft_transport_unit_test.go",
         "range_log_test.go",
+        "rebalance_objective_test.go",
         "replica_application_cmd_buf_test.go",
         "replica_application_state_machine_test.go",
         "replica_batch_updates_test.go",

diff --git a/pkg/kv/kvserver/allocator/allocatorimpl/allocator.go b/pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
@@ -2189,7 +2189,7 @@ func (a *Allocator) TransferLeaseTarget(
 		return candidates[a.randGen.Intn(len(candidates))]
 
 	case allocator.LoadConvergence:
-		leaseReplLoad := usageInfo.Load()
+		leaseReplLoad := usageInfo.TransferImpact()
 		candidates := make([]roachpb.StoreID, 0, len(existing)-1)
 		for _, repl := range existing {
 			if repl.StoreID != leaseRepl.StoreID() {

diff --git a/pkg/kv/kvserver/allocator/allocatorimpl/threshold.go b/pkg/kv/kvserver/allocator/allocatorimpl/threshold.go
@@ -27,6 +27,8 @@ func getLoadThreshold(dim load.Dimension, sv *settings.Values) float64 {
 	switch dim {
 	case load.Queries:
 		return allocator.QPSRebalanceThreshold.Get(sv)
+	case load.CPU:
+		return allocator.CPURebalanceThreshold.Get(sv)
 	default:
 		panic(errors.AssertionFailedf("Unkown load dimension %d", dim))
 	}
@@ -51,6 +53,8 @@ func getLoadMinThreshold(dim load.Dimension) float64 {
 	switch dim {
 	case load.Queries:
 		return allocator.MinQPSThresholdDifference
+	case load.CPU:
+		return allocator.MinCPUThresholdDifference
 	default:
 		panic(errors.AssertionFailedf("Unkown load dimension %d", dim))
 	}
@@ -76,6 +80,8 @@ func getLoadRebalanceMinRequiredDiff(dim load.Dimension, sv *settings.Values) fl
 	switch dim {
 	case load.Queries:
 		return allocator.MinQPSDifferenceForTransfers.Get(sv)
+	case load.CPU:
+		return allocator.MinCPUDifferenceForTransfers
 	default:
 		panic(errors.AssertionFailedf("Unkown load dimension %d", dim))
 	}
@@ -117,3 +123,13 @@ func MakeQPSOnlyDim(v float64) load.Load {
 	dims[load.Queries] = v
 	return dims
 }
+
+// WithAllDims returns a load vector with all dimensions filled in with the
+// value given.
+func WithAllDims(v float64) load.Load {
+	dims := load.Vector{}
+	for i := range dims {
+		dims[i] = v
+	}
+	return dims
+}
diff --git a/pkg/kv/kvserver/allocator/base.go b/pkg/kv/kvserver/allocator/base.go
@@ -37,6 +37,30 @@ const (
 	// lightly loaded clusters.
 	MinQPSThresholdDifference = 100
 
+	// MinCPUThresholdDifference is the minimum CPU difference from the cluster
+	// mean that this system should care about. The system won't attempt to
+	// take action if a store's CPU differs from the mean by less than this
+	// amount even if it is greater than the percentage threshold. This
+	// prevents too many lease transfers or range rebalances in lightly loaded
+	// clusters.
+	//
+	// NB: This represents 5% (1/20) utilization of 1 cpu on average.  This
+	// number was arrived at from testing to minimize thrashing. This number is
+	// set independent of processor speed and assumes identical value of cpu
+	// time across all stores. i.e. all cpu's are identical.
+	MinCPUThresholdDifference = float64(50 * time.Millisecond)
+
+	// MinCPUDifferenceForTransfers is the minimum CPU difference that a
+	// store rebalncer would care about to reconcile (via lease or replica
+	// rebalancing) between any two stores.
+	//
+	// NB: This is set to be two times the minimum threshold that a store needs
+	// to be above or below the mean to be considered overfull or underfull
+	// respectively. This is to make lease transfers and replica rebalances
+	// less sensistive to jitters in any given workload by introducing
+	// additional friction before taking these actions.
+	MinCPUDifferenceForTransfers = 2 * MinCPUThresholdDifference
+
 	// defaultLoadBasedRebalancingInterval is how frequently to check the store-level
 	// balance of the cluster.
 	defaultLoadBasedRebalancingInterval = time.Minute
@@ -107,6 +131,27 @@ var QPSRebalanceThreshold = func() *settings.FloatSetting {
 	return s
 }()
 
+// CPURebalanceThreshold is the minimum ratio of a store's cpu time to the mean
+// cpu time at which that store is considered overfull or underfull of cpu
+// usage.
+var CPURebalanceThreshold = func() *settings.FloatSetting {
+	s := settings.RegisterFloatSetting(
+		settings.SystemOnly,
+		"kv.allocator.store_cpu_rebalance_threshold",
+		"minimum fraction away from the mean a store's cpu usage can be before it is considered overfull or underfull",
+		0.10,
+		settings.NonNegativeFloat,
+		func(f float64) error {
+			if f < 0.01 {
+				return errors.Errorf("cannot set kv.allocator.store_cpu_rebalance_threshold to less than 0.01")
+			}
+			return nil
+		},
+	)
+	s.SetVisibility(settings.Public)
+	return s
+}()
+
 // LoadBasedRebalanceInterval controls how frequently each store checks for
 // load-base lease/replica rebalancing opportunties.
 var LoadBasedRebalanceInterval = settings.RegisterPublicDurationSettingWithExplicitUnit(

diff --git a/pkg/kv/kvserver/allocator/load/BUILD.bazel b/pkg/kv/kvserver/allocator/load/BUILD.bazel
@@ -10,6 +10,7 @@ go_library(
     ],
     importpath = "github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/load",
     visibility = ["//visibility:public"],
+    deps = ["//pkg/util/humanizeutil"],
 )
 
 get_x_data(name = "get_x_data")
diff --git a/pkg/kv/kvserver/allocator/load/dimension.go b/pkg/kv/kvserver/allocator/load/dimension.go
@@ -10,14 +10,21 @@
 
 package load
 
-import "fmt"
+import (
+	"fmt"
+	"time"
+
+	"github.com/cockroachdb/cockroach/pkg/util/humanizeutil"
+)
 
 // Dimension is a singe dimension of load that a component may track.
 type Dimension int
 
 const (
 	// Queries refers to the number of queries.
 	Queries Dimension = iota
+	// CPU refers to the cpu time (ns) used in processing.
+	CPU
 
 	nDimensionsTyped
 	nDimensions = int(nDimensionsTyped)
@@ -28,6 +35,8 @@ func (d Dimension) String() string {
 	switch d {
 	case Queries:
 		return "queries-per-second"
+	case CPU:
+		return "cpu-per-second"
 	default:
 		panic(fmt.Sprintf("cannot name: unknown dimension with ordinal %d", d))
 	}
@@ -38,6 +47,8 @@ func (d Dimension) Format(value float64) string {
 	switch d {
 	case Queries:
 		return fmt.Sprintf("%.1f", value)
+	case CPU:
+		return string(humanizeutil.Duration(time.Duration(int64(value))))
 	default:
 		panic(fmt.Sprintf("cannot format value: unknown dimension with ordinal %d", d))
 	}

diff --git a/pkg/kv/kvserver/allocator/load/load.go b/pkg/kv/kvserver/allocator/load/load.go
@@ -73,10 +73,29 @@ func ElementWiseProduct(a, b Load) Load {
 	return bimap(a, b, func(ai, bi float64) float64 { return ai * bi })
 }
 
+// Scale applies the factor given against every dimension.
+func Scale(l Load, factor float64) Load {
+	return nmap(l, func(_ Dimension, li float64) float64 { return li * factor })
+}
+
+// Set returns a new Load with every dimension equal to the value given.
+func Set(val float64) Load {
+	l := Vector{}
+	return nmap(l, func(_ Dimension, li float64) float64 { return val })
+}
+
 func bimap(a, b Load, op func(ai, bi float64) float64) Load {
 	mapped := Vector{}
 	for dim := Dimension(0); dim < Dimension(nDimensions); dim++ {
 		mapped[dim] = op(a.Dim(dim), b.Dim(dim))
 	}
 	return mapped
 }
+
+func nmap(l Load, op func(d Dimension, li float64) float64) Load {
+	mapped := Vector{}
+	for dim := Dimension(0); dim < Dimension(nDimensions); dim++ {
+		mapped[dim] = op(dim, l.Dim(dim))
+	}
+	return mapped
+}
diff --git a/pkg/kv/kvserver/allocator/range_usage_info.go b/pkg/kv/kvserver/allocator/range_usage_info.go
@@ -39,5 +39,23 @@ type RangeRequestLocalityInfo struct {
 func (r RangeUsageInfo) Load() load.Load {
 	dims := load.Vector{}
 	dims[load.Queries] = r.QueriesPerSecond
+	dims[load.CPU] = r.RequestCPUNanosPerSecond + r.RaftCPUNanosPerSecond
+	return dims
+}
+
+// TransferImpact returns the impact of transferring the lease for the range,
+// given the usage information. The impact is assumed to be symmetric, e.g. the
+// receiving store of the transfer will have load = prev_load(recv) + impact
+// after the transfer, whilst the sending side will have load =
+// prev_load(sender) - impact after the transfer.
+func (r RangeUsageInfo) TransferImpact() load.Load {
+	dims := load.Vector{}
+	dims[load.Queries] = r.QueriesPerSecond
+	// Only use the request recorded cpu. This assumes that all replicas will
+	// use the same amount of raft cpu - which may be dubious.
+	//
+	// TODO(kvoli): Look to separate out leaseholder vs replica cpu usage in
+	// accounting to account for follower reads if able.
+	dims[load.CPU] = r.RequestCPUNanosPerSecond
 	return dims
 }