Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: allocation algorithm should transfer replicas and leases away from an overloaded store #82611

Closed
kvoli opened this issue Jun 8, 2022 · 1 comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-sre For issues SRE opened or otherwise cares about tracking. T-kv KV Team

Comments

@kvoli
Copy link
Collaborator

kvoli commented Jun 8, 2022

There have been several recent incidents where we have noted that despite admission control being on, it is possible for overload (CPU, IO) to occur on a store. This is partially due to follower store load not being subjected to admission control on the leaseholder store.

The store rebalancer and replicate queue in this case are unaware of the degraded state of the store, as they consider only QPS and range count respectively; in addition to existing constraint signals such as disk fullness and L0 sublevels [#78608].

The solution would be to add normalized CPU usage to be considered, in much the same way as L0 sub-levels.

This solution should be implemented in two phases:

  1. Add a constraint on store CPU for transferring replicas or leases towards an overloaded store. This check already exists for l0-sublevels.
  2. Enforce the constraint when considering replicas or leases for transfer away from an overloaded store. i.e. When a store does not meet the overload check, stores with leases for ranges, with a replica on the overloaded store will begin transferring them elsewhere. The overloaded store itself will likewise seek to transfer it's own leases.

This overload check must be both relative to the cluster average, as well as set to some threshold.

e.g.

sys_usr_cpu_normalized > threshold && sys_usr_cpu_normalized > mean(cluster...) * 1.1

Jira issue: CRDB-16533

@kvoli kvoli added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv Anything in KV that doesn't belong in a more specific category. labels Jun 8, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Jun 8, 2022
@joshimhoff joshimhoff added the O-sre For issues SRE opened or otherwise cares about tracking. label Jun 8, 2022
tbg added a commit to tbg/cockroach that referenced this issue Jul 1, 2022
The plan for cockroachdb#79215 is to have stores not send to followers that are
experiencing IO overload, unless this is necessary for quorum. This
commit adds the gossip signal used for this purpose, with the remainder
of the functionality to be added in a follow-up commit. The signal
is gossip on the IO admission control adjustment schedule, i.e. every
15s (with twice that as the TTL).

The signal may also, in the future, be used by the distribution layer to
further improve on our data and load placement algorithms.

Touches cockroachdb#79215.
Touches cockroachdb#77604.
Touches cockroachdb#82611.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Jul 5, 2022
The plan for cockroachdb#79215 is to have stores not send to followers that are
experiencing IO overload, unless this is necessary for quorum.

This commit teaches admission control to relay the current status of the
IO overload signals to the Store, which in turn gossips it. We add a
type `IOThreshold` that encapsulates the input signals to I/O admission
control to present them as a unified score, which hopefully leads to
less ad-hoc use of the underlying signals.

Together, this paves the way for using the signal to inform distribution
decisions (cockroachdb#83490) and (bluntly) shape raft traffic (cockroachdb#79215).

Touches cockroachdb#79215.
Touches cockroachdb#77604.
Touches cockroachdb#82611.
Touches cockroachdb#83490.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Jul 5, 2022
The plan for cockroachdb#79215 is to have stores not send to followers that are
experiencing IO overload, unless this is necessary for quorum.

This commit teaches admission control to relay the current status of the
IO overload signals to the Store, which in turn gossips it. We add a
type `IOThreshold` that encapsulates the input signals to I/O admission
control to present them as a unified score, which hopefully leads to
less ad-hoc use of the underlying signals.

Together, this paves the way for using the signal to inform distribution
decisions (cockroachdb#83490) and (bluntly) shape raft traffic (cockroachdb#79215).

Touches cockroachdb#79215.
Touches cockroachdb#77604.
Touches cockroachdb#82611.
Touches cockroachdb#83490.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Jul 6, 2022
The plan for cockroachdb#79215 is to have stores not send to followers that are
experiencing IO overload, unless this is necessary for quorum.

This commit teaches admission control to relay the current status of the
IO overload signals to the Store, which in turn gossips it. We add a
type `IOThreshold` that encapsulates the input signals to I/O admission
control to present them as a unified score, which hopefully leads to
less ad-hoc use of the underlying signals.

Together, this paves the way for using the signal to inform distribution
decisions (cockroachdb#83490) and (bluntly) shape raft traffic (cockroachdb#79215).

Touches cockroachdb#79215.
Touches cockroachdb#77604.
Touches cockroachdb#82611.
Touches cockroachdb#83490.

Release note: None
craig bot pushed a commit that referenced this issue Jul 7, 2022
83720: admission,kvserver: broadcast per-store IO overload status r=kvoli a=tbg

The plan for #79215 is to have stores not send to followers that are
experiencing IO overload, unless this is necessary for quorum.

This commit teaches admission control to relay the current status of the
IO overload signals to the Store, which in turn gossips it. We add a
type `IOThreshold` that encapsulates the input signals to I/O admission
control to present them as a unified score, which hopefully leads to
less ad-hoc use of the underlying signals.

Together, this paves the way for using the signal to inform distribution
decisions (#83490) and (bluntly) shape raft traffic (#79215).

Touches #79215.
Touches #77604.
Touches #82611.
Touches #83490.

Release note: None

83732: sql: move NullableArgs function property to overload level r=chengxiong-ruan a=chengxiong-ruan

Currently we only have builtins, and that all overloads of a
same function name share function properties. However, we're
going to support user defined functions whose properties can
vary as how users define them. So we need to move any property
that's relevant to UDF to overload level. Currently, `NullableArgs`
is the only one matters.

Release note: None.

83938: sql: add identifiers to sampled query r=THardy98 a=THardy98

Partially addresses: #71328

This change introduces identifiers into the sampled query log, namely:
- Database name
- Session ID
- Transaction ID
- Statement ID

Adding transaction ID incurs an additional lock access, the difference in performance is negligible. Results after running `kv95` on a 3 node GCE cluster using roachprod:
```
setup:

./workload init kv --splits 1000 --read-percent 95
./workload run kv --read-percent 95 --concurrency 64 --sequential --duration 30m
```

```
master:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 1800.0s        0       29725479        16514.2      3.4      2.8      8.9     16.8    104.9  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 1800.0s        0        1564158          869.0      8.5      8.1     15.7     26.2    100.7  write

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
 1800.0s        0       31289637        17383.1      3.7      2.9     10.0     17.8    104.9
```

```
enrich_telemetry_add_identifiers

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 1800.0s        0       29635045        16463.9      3.4      2.8      8.9     16.8    117.4  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 1800.0s        0        1561022          867.2      8.5      7.9     15.7     26.2    113.2  write

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
 1800.0s        0       31196067        17331.1      3.7      2.9     10.0     17.8    117.4
```


Release note (sql change): Sampled query telemetry log now includes
session/transaction/statement IDs, and database name of the query.

83953: reducesql: reduce index STORING columns r=mgartner a=mgartner

The `reduce` tool can now reduce `STORING` columns in indexes and unique
constraints.

Release note: None

Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Chengxiong Ruan <[email protected]>
Co-authored-by: Thomas Hardy <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
@kvoli kvoli added the A-kv-distribution Relating to rebalancing and leasing. label Jul 27, 2022
@kvoli
Copy link
Collaborator Author

kvoli commented Aug 7, 2023

Closing in favor of #103320.

@kvoli kvoli closed this as completed Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-sre For issues SRE opened or otherwise cares about tracking. T-kv KV Team
Projects
None yet
Development

No branches or pull requests

2 participants