Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
96031: sql: add mixed version test for system.role_members user ids upgrade r=rafiss a=andyyang890

This patch adds a mixed version logictest that ensures that GRANT ROLE
continues to work properly in a cluster with both 22.2 and 23.1 nodes
(i.e. nodes that have run the system.role_members user ids upgrade).

Part of #92342

Release note: None

96127: kvserver: introduce cpu rebalancing r=nvanbenschoten a=kvoli

This patch allows the store rebalancer to use CPU in place of QPS when
balancing load on a cluster. This patch adds `cpu` as an option with the
cluster setting:

`kv.allocator.load_based_rebalancing.objective`

When set to `cpu`, rather than `qps`. The store rebalancer will perform
a mostly identical function, however target balancing the sum of all
replica's cpu time on each store, rather than qps. The default remains
as `qps` here.

Similar to QPS, the rebalance threshold can be set to allow controlling
the range above and below the mean store CPU is considered imbalanced,
either overfull or underfull respectively:

`kv.allocator.cpu_rebalance_threshold`: 0.1

In order to manage with mixed versions during upgrade and some
architectures not supporting the cpu sampling method, a rebalance
objective manager is introduced in `rebalance_objective.go`. The manager
mediates access to the rebalance objective and overwrites it in cases
where the objective set in the cluster setting cannot be supported.

The results when using CPU in comparison to QPS can be found [here](https://docs.google.com/document/d/1QLhD20BTamjj3-dSG9F1gW7XMBy9miGPpJpmu2Dn3yo/edit#) (internal).


<details>
<summary>Results Summary</summary>

![image](https://user-images.githubusercontent.com/39606633/215580650-b12ff509-5cf5-4ffa-880d-8387e2ef0afa.png)

![image](https://user-images.githubusercontent.com/39606633/215580626-3d748ba1-e9a4-4abb-8acd-2c319203932e.png)

![image](https://user-images.githubusercontent.com/39606633/215580585-58e6000d-b6cf-430a-b4b7-d14a77eab3bd.png)

</details>


<details>
<summary>Detailed Allocbench Results</summary>

```
kv/r=0/access=skew
master
    median cost(gb):05.81 cpu(%):14.97 write(%):37.83
    stddev cost(gb):01.87 cpu(%):03.98 write(%):07.01
cpu rebalancing
    median cost(gb):08.76 cpu(%):14.42 write(%):36.61
    stddev cost(gb):02.66 cpu(%):01.85 write(%):04.80
kv/r=0/ops=skew
master
    median cost(gb):06.23 cpu(%):26.05 write(%):57.33
    stddev cost(gb):02.92 cpu(%):05.83 write(%):08.20
cpu rebalancing
    median cost(gb):04.28 cpu(%):11.45 write(%):31.28
    stddev cost(gb):02.25 cpu(%):02.51 write(%):06.68
kv/r=50/ops=skew
master
    median cost(gb):04.36 cpu(%):22.84 write(%):48.09
    stddev cost(gb):01.12 cpu(%):02.71 write(%):05.51
cpu rebalancing
    median cost(gb):04.64 cpu(%):13.49 write(%):43.05
    stddev cost(gb):01.07 cpu(%):01.26 write(%):08.58
kv/r=95/access=skew
master
    median cost(gb):00.00 cpu(%):09.51 write(%):01.24
    stddev cost(gb):00.00 cpu(%):01.74 write(%):00.27
cpu rebalancing
    median cost(gb):00.00 cpu(%):05.66 write(%):01.31
    stddev cost(gb):00.00 cpu(%):01.56 write(%):00.26
kv/r=95/ops=skew
master
    median cost(gb):0.00 cpu(%):47.29 write(%):00.93
    stddev cost(gb):0.09 cpu(%):04.30 write(%):00.17
cpu rebalancing
    median cost(gb):0.00 cpu(%):08.16 write(%):01.30
    stddev cost(gb):0.01 cpu(%):04.59 write(%):00.20
```


</details>

resolves: #95380

Release note (ops change) Add option to balance cpu time (cpu)
instead of queries per second (qps) among stores in a cluster. This is
done by setting `kv.allocator.load_based_rebalancing.objective='cpu'`.
`kv.allocator.cpu_rebalance_threshold` is also added, similar to
`kv.allocator.qps_rebalance_threshold` to control the target range for
store cpu above and below the cluster mean.

96440: ui: add execution insights to statement and transaction fingerprint details r=ericharmeling a=ericharmeling

This commit adds execution insights to the Statement Fingerprint and Transaction Fingerprint Details pages.

Part of #83780.

Loom: https://www.loom.com/share/98d2023b672e43fa8016829aa641a829

Note that the SQL queries against the `*_execution_insights` tables are updated to `SELECT DISTINCT ON (*_fingerprint_id, problems, causes)` (equivalent to `GROUP BY (*_fingerprint_id, problems, causes)`) from the latest results in the tables, rather than `row_number() OVER ( PARTITION BY stmt_fingerprint_id, problem, causes ORDER BY end_time DESC ) AS rank... WHERE rank = 1`. Both patterns return the same result, but one uses aggregation and the other uses a window function. I find the `DISTINCT ON/GROUP BY` pattern easier to understand, I'm not seeing much difference in the planning/execution time between the two over the same set of data, and I'm seeing `DISTINCT ON/GROUP BY` coming up as more performant in almost all the secondary sources I've encountered.

Release note (ui change): Added execution insights to the Statement Fingerprint Details and Transaction Fingerprint Details Pages.

96828: collatedstring: support default, C, and POSIX in expressions r=otan a=rafiss

fixes #50734
fixes #95667
informs #57255

---
### collatedstring: create new package 

Move the small amount of code from tree/collatedstring.go

---

### collatedstring: support C and POSIX in expressions

Release note (sql change): Expressions of the form `COLLATE "default"`,
`COLLATE "C"`, and `COLLATE "POSIX"` are now supported. Since the
default collation cannot be changed currently, these expressions are all
equivalent. The expressions are evaluated by treating the input as a
normal string, and ignoring the collation.

This means that comparisons between strings and collated strings that
use "default", "C", or "POSIX" are now supported.

Creating a column with the "C" or "POSIX" collations is still not
supported.

96870: kvserver: use replicasByKey addition func in snapshot path r=tbg a=pavelkalinnikov

This commit makes one step towards better code sharing between `Replica` initialization paths: split trigger and snapshot application. It makes both to use the same method to check and insert the initialized `Replica` to `replicasByKey` map.

Touches #94912

96874: roachtest: run scheduled backup only on clusters with enterprise license r=stevendanna a=msbutler

Epic: none

Release note: None

96883: go.mod: bump Pebble to 829675f94811 r=RaduBerinde a=RaduBerinde

829675f9 db: fix ObsoleteSize stat
2f086b74 db: refactor compaction splitting to reduce key comparisons

Release note: None
Epic: none

Co-authored-by: Andy Yang <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Eric Harmeling <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
  • Loading branch information
8 people committed Feb 9, 2023
8 parents a439273 + 7cb807e + c28ed6b + 2ec730f + a7e9c4a + f5bbd77 + 399a278 + e95cc41 commit 732c3a7
Show file tree
Hide file tree
Showing 87 changed files with 2,218 additions and 498 deletions.
6 changes: 3 additions & 3 deletions DEPS.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -1525,10 +1525,10 @@ def go_deps():
patches = [
"@com_github_cockroachdb_cockroach//build/patches:com_github_cockroachdb_pebble.patch",
],
sha256 = "7087b386f9b4da9ce24708ae5b26eb175655892045b92c30952ba353b47d42aa",
strip_prefix = "github.com/cockroachdb/[email protected]20230208205550-65fa048bf403",
sha256 = "b6c2765d18f70c6ebbb0f1ff1d9a5d1b453a4bd5c57b8dd3a6625d7c4e28d63d",
strip_prefix = "github.com/cockroachdb/[email protected]20230209160836-829675f94811",
urls = [
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20230208205550-65fa048bf403.zip",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20230209160836-829675f94811.zip",
],
)
go_repository(
Expand Down
2 changes: 1 addition & 1 deletion build/bazelutil/distdir_files.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ DISTDIR_FILES = {
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/google-api-go-client/com_github_cockroachdb_google_api_go_client-v0.80.1-0.20221117193156-6a9f7150cb93.zip": "b3378c579f4f4340403038305907d672c86f615f8233118a8873ebe4229c4f39",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/gostdlib/com_github_cockroachdb_gostdlib-v1.19.0.zip": "c4d516bcfe8c07b6fc09b8a9a07a95065b36c2855627cb3514e40c98f872b69e",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/logtags/com_github_cockroachdb_logtags-v0.0.0-20230118201751-21c54148d20b.zip": "ca7776f47e5fecb4c495490a679036bfc29d95bd7625290cfdb9abb0baf97476",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20230208205550-65fa048bf403.zip": "7087b386f9b4da9ce24708ae5b26eb175655892045b92c30952ba353b47d42aa",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20230209160836-829675f94811.zip": "b6c2765d18f70c6ebbb0f1ff1d9a5d1b453a4bd5c57b8dd3a6625d7c4e28d63d",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/redact/com_github_cockroachdb_redact-v1.1.3.zip": "7778b1e4485e4f17f35e5e592d87eb99c29e173ac9507801d000ad76dd0c261e",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/returncheck/com_github_cockroachdb_returncheck-v0.0.0-20200612231554-92cdbca611dd.zip": "ce92ba4352deec995b1f2eecf16eba7f5d51f5aa245a1c362dfe24c83d31f82b",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/sentry-go/com_github_cockroachdb_sentry_go-v0.6.1-cockroachdb.2.zip": "fbb2207d02aecfdd411b1357efe1192dbb827959e36b7cab7491731ac55935c9",
Expand Down
2 changes: 1 addition & 1 deletion docs/generated/settings/settings-for-tenants.txt
Original file line number Diff line number Diff line change
Expand Up @@ -295,4 +295,4 @@ trace.jaeger.agent string the address of a Jaeger agent to receive traces using
trace.opentelemetry.collector string address of an OpenTelemetry trace collector to receive traces using the otel gRPC protocol, as <host>:<port>. If no port is specified, 4317 will be used.
trace.span_registry.enabled boolean true if set, ongoing traces can be seen at https://<ui>/#/debug/tracez
trace.zipkin.collector string the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used.
version version 1000022.2-38 set the active cluster version in the format '<major>.<minor>'
version version 1000022.2-40 set the active cluster version in the format '<major>.<minor>'
4 changes: 3 additions & 1 deletion docs/generated/settings/settings.html
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,11 @@
<tr><td><div id="setting-jobs-retention-time" class="anchored"><code>jobs.retention_time</code></div></td><td>duration</td><td><code>336h0m0s</code></td><td>the amount of time to retain records for completed jobs before</td></tr>
<tr><td><div id="setting-kv-allocator-load-based-lease-rebalancing-enabled" class="anchored"><code>kv.allocator.load_based_lease_rebalancing.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>set to enable rebalancing of range leases based on load and latency</td></tr>
<tr><td><div id="setting-kv-allocator-load-based-rebalancing" class="anchored"><code>kv.allocator.load_based_rebalancing</code></div></td><td>enumeration</td><td><code>leases and replicas</code></td><td>whether to rebalance based on the distribution of load across stores [off = 0, leases = 1, leases and replicas = 2]</td></tr>
<tr><td><div id="setting-kv-allocator-load-based-rebalancing-objective" class="anchored"><code>kv.allocator.load_based_rebalancing.objective</code></div></td><td>enumeration</td><td><code>qps</code></td><td>what objective does the cluster use to rebalance; if set to `qps` the cluster will attempt to balance qps among stores, if set to `cpu` the cluster will attempt to balance cpu usage among stores [qps = 0, cpu = 1]</td></tr>
<tr><td><div id="setting-kv-allocator-load-based-rebalancing-interval" class="anchored"><code>kv.allocator.load_based_rebalancing_interval</code></div></td><td>duration</td><td><code>1m0s</code></td><td>the rough interval at which each store will check for load-based lease / replica rebalancing opportunities</td></tr>
<tr><td><div id="setting-kv-allocator-qps-rebalance-threshold" class="anchored"><code>kv.allocator.qps_rebalance_threshold</code></div></td><td>float</td><td><code>0.1</code></td><td>minimum fraction away from the mean a store&#39;s QPS (such as queries per second) can be before it is considered overfull or underfull</td></tr>
<tr><td><div id="setting-kv-allocator-range-rebalance-threshold" class="anchored"><code>kv.allocator.range_rebalance_threshold</code></div></td><td>float</td><td><code>0.05</code></td><td>minimum fraction away from the mean a store&#39;s range count can be before it is considered overfull or underfull</td></tr>
<tr><td><div id="setting-kv-allocator-store-cpu-rebalance-threshold" class="anchored"><code>kv.allocator.store_cpu_rebalance_threshold</code></div></td><td>float</td><td><code>0.1</code></td><td>minimum fraction away from the mean a store&#39;s cpu usage can be before it is considered overfull or underfull</td></tr>
<tr><td><div id="setting-kv-bulk-io-write-max-rate" class="anchored"><code>kv.bulk_io_write.max_rate</code></div></td><td>byte size</td><td><code>1.0 TiB</code></td><td>the rate limit (bytes/sec) to use for writes to disk on behalf of bulk io ops</td></tr>
<tr><td><div id="setting-kv-bulk-sst-max-allowed-overage" class="anchored"><code>kv.bulk_sst.max_allowed_overage</code></div></td><td>byte size</td><td><code>64 MiB</code></td><td>if positive, allowed size in excess of target size for SSTs from export requests; export requests (i.e. BACKUP) may buffer up to the sum of kv.bulk_sst.target_size and kv.bulk_sst.max_allowed_overage in memory</td></tr>
<tr><td><div id="setting-kv-bulk-sst-target-size" class="anchored"><code>kv.bulk_sst.target_size</code></div></td><td>byte size</td><td><code>16 MiB</code></td><td>target size for SSTs emitted from export requests; export requests (i.e. BACKUP) may buffer up to the sum of kv.bulk_sst.target_size and kv.bulk_sst.max_allowed_overage in memory</td></tr>
Expand Down Expand Up @@ -236,6 +238,6 @@
<tr><td><div id="setting-trace-opentelemetry-collector" class="anchored"><code>trace.opentelemetry.collector</code></div></td><td>string</td><td><code></code></td><td>address of an OpenTelemetry trace collector to receive traces using the otel gRPC protocol, as &lt;host&gt;:&lt;port&gt;. If no port is specified, 4317 will be used.</td></tr>
<tr><td><div id="setting-trace-span-registry-enabled" class="anchored"><code>trace.span_registry.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>if set, ongoing traces can be seen at https://&lt;ui&gt;/#/debug/tracez</td></tr>
<tr><td><div id="setting-trace-zipkin-collector" class="anchored"><code>trace.zipkin.collector</code></div></td><td>string</td><td><code></code></td><td>the address of a Zipkin instance to receive traces, as &lt;host&gt;:&lt;port&gt;. If no port is specified, 9411 will be used.</td></tr>
<tr><td><div id="setting-version" class="anchored"><code>version</code></div></td><td>version</td><td><code>1000022.2-38</code></td><td>set the active cluster version in the format &#39;&lt;major&gt;.&lt;minor&gt;&#39;</td></tr>
<tr><td><div id="setting-version" class="anchored"><code>version</code></div></td><td>version</td><td><code>1000022.2-40</code></td><td>set the active cluster version in the format &#39;&lt;major&gt;.&lt;minor&gt;&#39;</td></tr>
</tbody>
</table>
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ require (
github.com/cockroachdb/go-test-teamcity v0.0.0-20191211140407-cff980ad0a55
github.com/cockroachdb/gostdlib v1.19.0
github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b
github.com/cockroachdb/pebble v0.0.0-20230208205550-65fa048bf403
github.com/cockroachdb/pebble v0.0.0-20230209160836-829675f94811
github.com/cockroachdb/redact v1.1.3
github.com/cockroachdb/returncheck v0.0.0-20200612231554-92cdbca611dd
github.com/cockroachdb/stress v0.0.0-20220803192808-1806698b1b7b
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -483,8 +483,8 @@ github.com/cockroachdb/gostdlib v1.19.0/go.mod h1:+dqqpARXbE/gRDEhCak6dm0l14AaTy
github.com/cockroachdb/logtags v0.0.0-20211118104740-dabe8e521a4f/go.mod h1:Vz9DsVWQQhf3vs21MhPMZpMGSht7O/2vFW2xusFUVOs=
github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b h1:r6VH0faHjZeQy818SGhaone5OnYfxFR/+AzdY3sf5aE=
github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b/go.mod h1:Vz9DsVWQQhf3vs21MhPMZpMGSht7O/2vFW2xusFUVOs=
github.com/cockroachdb/pebble v0.0.0-20230208205550-65fa048bf403 h1:48ODta9RgBfr4nfFG/gwfAcP4MNWDuSR1rOQ3hPefFo=
github.com/cockroachdb/pebble v0.0.0-20230208205550-65fa048bf403/go.mod h1:Nb5lgvnQ2+oGlE/EyZy4+2/CxRh9KfvCXnag1vtpxVM=
github.com/cockroachdb/pebble v0.0.0-20230209160836-829675f94811 h1:ytcWPaNPhNoGMWEhDvS3zToKcDpRsLuRolQJBVGdozk=
github.com/cockroachdb/pebble v0.0.0-20230209160836-829675f94811/go.mod h1:Nb5lgvnQ2+oGlE/EyZy4+2/CxRh9KfvCXnag1vtpxVM=
github.com/cockroachdb/redact v1.1.3 h1:AKZds10rFSIj7qADf0g46UixK8NNLwWTNdCIGS5wfSQ=
github.com/cockroachdb/redact v1.1.3/go.mod h1:BVNblN9mBWFyMyqK1k3AAiSxhvhfK2oOZZ2lK+dpvRg=
github.com/cockroachdb/returncheck v0.0.0-20200612231554-92cdbca611dd h1:KFOt5I9nEKZgCnOSmy8r4Oykh8BYQO8bFOTgHDS8YZA=
Expand Down
2 changes: 2 additions & 0 deletions pkg/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -2048,6 +2048,7 @@ GO_TARGETS = [
"//pkg/util/circuit:circuit_test",
"//pkg/util/cloudinfo:cloudinfo",
"//pkg/util/cloudinfo:cloudinfo_test",
"//pkg/util/collatedstring:collatedstring",
"//pkg/util/contextutil:contextutil",
"//pkg/util/contextutil:contextutil_test",
"//pkg/util/ctxgroup:ctxgroup",
Expand Down Expand Up @@ -3143,6 +3144,7 @@ GET_X_DATA_TARGETS = [
"//pkg/util/cgroups:get_x_data",
"//pkg/util/circuit:get_x_data",
"//pkg/util/cloudinfo:get_x_data",
"//pkg/util/collatedstring:get_x_data",
"//pkg/util/contextutil:get_x_data",
"//pkg/util/ctxgroup:get_x_data",
"//pkg/util/duration:get_x_data",
Expand Down
9 changes: 8 additions & 1 deletion pkg/clusterversion/cockroach_versions.go
Original file line number Diff line number Diff line change
Expand Up @@ -411,6 +411,10 @@ const (
// responsible for polling the jobs table for metrics.
V23_1_CreateJobsMetricsPollingJob

// V23_1AllocatorCPUBalancing adds balancing CPU usage among stores using
// the allocator and store rebalancer. It assumes that at this version,
// stores now include their CPU in the StoreCapacity proto when gossiping.
V23_1AllocatorCPUBalancing
// *************************************************
// Step (1): Add new versions here.
// Do not add new versions to a patch release.
Expand Down Expand Up @@ -708,7 +712,10 @@ var rawVersionsSingleton = keyedVersions{
Key: V23_1_CreateJobsMetricsPollingJob,
Version: roachpb.Version{Major: 22, Minor: 2, Internal: 38},
},

{
Key: V23_1AllocatorCPUBalancing,
Version: roachpb.Version{Major: 22, Minor: 2, Internal: 40},
},
// *************************************************
// Step (2): Add new versions here.
// Do not add new versions to a patch release.
Expand Down
2 changes: 2 additions & 0 deletions pkg/kv/kvserver/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ go_library(
"raft_transport_metrics.go",
"raft_truncator_replica.go",
"range_log.go",
"rebalance_objective.go",
"replica.go",
"replica_app_batch.go",
"replica_application_cmd.go",
Expand Down Expand Up @@ -273,6 +274,7 @@ go_test(
"raft_transport_test.go",
"raft_transport_unit_test.go",
"range_log_test.go",
"rebalance_objective_test.go",
"replica_application_cmd_buf_test.go",
"replica_application_state_machine_test.go",
"replica_batch_updates_test.go",
Expand Down
2 changes: 1 addition & 1 deletion pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
Original file line number Diff line number Diff line change
Expand Up @@ -2189,7 +2189,7 @@ func (a *Allocator) TransferLeaseTarget(
return candidates[a.randGen.Intn(len(candidates))]

case allocator.LoadConvergence:
leaseReplLoad := usageInfo.Load()
leaseReplLoad := usageInfo.TransferImpact()
candidates := make([]roachpb.StoreID, 0, len(existing)-1)
for _, repl := range existing {
if repl.StoreID != leaseRepl.StoreID() {
Expand Down
16 changes: 16 additions & 0 deletions pkg/kv/kvserver/allocator/allocatorimpl/threshold.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ func getLoadThreshold(dim load.Dimension, sv *settings.Values) float64 {
switch dim {
case load.Queries:
return allocator.QPSRebalanceThreshold.Get(sv)
case load.CPU:
return allocator.CPURebalanceThreshold.Get(sv)
default:
panic(errors.AssertionFailedf("Unkown load dimension %d", dim))
}
Expand All @@ -51,6 +53,8 @@ func getLoadMinThreshold(dim load.Dimension) float64 {
switch dim {
case load.Queries:
return allocator.MinQPSThresholdDifference
case load.CPU:
return allocator.MinCPUThresholdDifference
default:
panic(errors.AssertionFailedf("Unkown load dimension %d", dim))
}
Expand All @@ -76,6 +80,8 @@ func getLoadRebalanceMinRequiredDiff(dim load.Dimension, sv *settings.Values) fl
switch dim {
case load.Queries:
return allocator.MinQPSDifferenceForTransfers.Get(sv)
case load.CPU:
return allocator.MinCPUDifferenceForTransfers
default:
panic(errors.AssertionFailedf("Unkown load dimension %d", dim))
}
Expand Down Expand Up @@ -117,3 +123,13 @@ func MakeQPSOnlyDim(v float64) load.Load {
dims[load.Queries] = v
return dims
}

// WithAllDims returns a load vector with all dimensions filled in with the
// value given.
func WithAllDims(v float64) load.Load {
dims := load.Vector{}
for i := range dims {
dims[i] = v
}
return dims
}
45 changes: 45 additions & 0 deletions pkg/kv/kvserver/allocator/base.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,30 @@ const (
// lightly loaded clusters.
MinQPSThresholdDifference = 100

// MinCPUThresholdDifference is the minimum CPU difference from the cluster
// mean that this system should care about. The system won't attempt to
// take action if a store's CPU differs from the mean by less than this
// amount even if it is greater than the percentage threshold. This
// prevents too many lease transfers or range rebalances in lightly loaded
// clusters.
//
// NB: This represents 5% (1/20) utilization of 1 cpu on average. This
// number was arrived at from testing to minimize thrashing. This number is
// set independent of processor speed and assumes identical value of cpu
// time across all stores. i.e. all cpu's are identical.
MinCPUThresholdDifference = float64(50 * time.Millisecond)

// MinCPUDifferenceForTransfers is the minimum CPU difference that a
// store rebalncer would care about to reconcile (via lease or replica
// rebalancing) between any two stores.
//
// NB: This is set to be two times the minimum threshold that a store needs
// to be above or below the mean to be considered overfull or underfull
// respectively. This is to make lease transfers and replica rebalances
// less sensistive to jitters in any given workload by introducing
// additional friction before taking these actions.
MinCPUDifferenceForTransfers = 2 * MinCPUThresholdDifference

// defaultLoadBasedRebalancingInterval is how frequently to check the store-level
// balance of the cluster.
defaultLoadBasedRebalancingInterval = time.Minute
Expand Down Expand Up @@ -107,6 +131,27 @@ var QPSRebalanceThreshold = func() *settings.FloatSetting {
return s
}()

// CPURebalanceThreshold is the minimum ratio of a store's cpu time to the mean
// cpu time at which that store is considered overfull or underfull of cpu
// usage.
var CPURebalanceThreshold = func() *settings.FloatSetting {
s := settings.RegisterFloatSetting(
settings.SystemOnly,
"kv.allocator.store_cpu_rebalance_threshold",
"minimum fraction away from the mean a store's cpu usage can be before it is considered overfull or underfull",
0.10,
settings.NonNegativeFloat,
func(f float64) error {
if f < 0.01 {
return errors.Errorf("cannot set kv.allocator.store_cpu_rebalance_threshold to less than 0.01")
}
return nil
},
)
s.SetVisibility(settings.Public)
return s
}()

// LoadBasedRebalanceInterval controls how frequently each store checks for
// load-base lease/replica rebalancing opportunties.
var LoadBasedRebalanceInterval = settings.RegisterPublicDurationSettingWithExplicitUnit(
Expand Down
1 change: 1 addition & 0 deletions pkg/kv/kvserver/allocator/load/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ go_library(
],
importpath = "github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/load",
visibility = ["//visibility:public"],
deps = ["//pkg/util/humanizeutil"],
)

get_x_data(name = "get_x_data")
13 changes: 12 additions & 1 deletion pkg/kv/kvserver/allocator/load/dimension.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,21 @@

package load

import "fmt"
import (
"fmt"
"time"

"github.com/cockroachdb/cockroach/pkg/util/humanizeutil"
)

// Dimension is a singe dimension of load that a component may track.
type Dimension int

const (
// Queries refers to the number of queries.
Queries Dimension = iota
// CPU refers to the cpu time (ns) used in processing.
CPU

nDimensionsTyped
nDimensions = int(nDimensionsTyped)
Expand All @@ -28,6 +35,8 @@ func (d Dimension) String() string {
switch d {
case Queries:
return "queries-per-second"
case CPU:
return "cpu-per-second"
default:
panic(fmt.Sprintf("cannot name: unknown dimension with ordinal %d", d))
}
Expand All @@ -38,6 +47,8 @@ func (d Dimension) Format(value float64) string {
switch d {
case Queries:
return fmt.Sprintf("%.1f", value)
case CPU:
return string(humanizeutil.Duration(time.Duration(int64(value))))
default:
panic(fmt.Sprintf("cannot format value: unknown dimension with ordinal %d", d))
}
Expand Down
19 changes: 19 additions & 0 deletions pkg/kv/kvserver/allocator/load/load.go
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,29 @@ func ElementWiseProduct(a, b Load) Load {
return bimap(a, b, func(ai, bi float64) float64 { return ai * bi })
}

// Scale applies the factor given against every dimension.
func Scale(l Load, factor float64) Load {
return nmap(l, func(_ Dimension, li float64) float64 { return li * factor })
}

// Set returns a new Load with every dimension equal to the value given.
func Set(val float64) Load {
l := Vector{}
return nmap(l, func(_ Dimension, li float64) float64 { return val })
}

func bimap(a, b Load, op func(ai, bi float64) float64) Load {
mapped := Vector{}
for dim := Dimension(0); dim < Dimension(nDimensions); dim++ {
mapped[dim] = op(a.Dim(dim), b.Dim(dim))
}
return mapped
}

func nmap(l Load, op func(d Dimension, li float64) float64) Load {
mapped := Vector{}
for dim := Dimension(0); dim < Dimension(nDimensions); dim++ {
mapped[dim] = op(dim, l.Dim(dim))
}
return mapped
}
18 changes: 18 additions & 0 deletions pkg/kv/kvserver/allocator/range_usage_info.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,5 +39,23 @@ type RangeRequestLocalityInfo struct {
func (r RangeUsageInfo) Load() load.Load {
dims := load.Vector{}
dims[load.Queries] = r.QueriesPerSecond
dims[load.CPU] = r.RequestCPUNanosPerSecond + r.RaftCPUNanosPerSecond
return dims
}

// TransferImpact returns the impact of transferring the lease for the range,
// given the usage information. The impact is assumed to be symmetric, e.g. the
// receiving store of the transfer will have load = prev_load(recv) + impact
// after the transfer, whilst the sending side will have load =
// prev_load(sender) - impact after the transfer.
func (r RangeUsageInfo) TransferImpact() load.Load {
dims := load.Vector{}
dims[load.Queries] = r.QueriesPerSecond
// Only use the request recorded cpu. This assumes that all replicas will
// use the same amount of raft cpu - which may be dubious.
//
// TODO(kvoli): Look to separate out leaseholder vs replica cpu usage in
// accounting to account for follower reads if able.
dims[load.CPU] = r.RequestCPUNanosPerSecond
return dims
}
Loading

0 comments on commit 732c3a7

Please sign in to comment.