-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: allocation algorithm should transfer replicas and leases away from an overloaded store #82611
Labels
A-kv
Anything in KV that doesn't belong in a more specific category.
A-kv-distribution
Relating to rebalancing and leasing.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-sre
For issues SRE opened or otherwise cares about tracking.
T-kv
KV Team
Comments
kvoli
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-kv
Anything in KV that doesn't belong in a more specific category.
labels
Jun 8, 2022
joshimhoff
added
the
O-sre
For issues SRE opened or otherwise cares about tracking.
label
Jun 8, 2022
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Jul 1, 2022
The plan for cockroachdb#79215 is to have stores not send to followers that are experiencing IO overload, unless this is necessary for quorum. This commit adds the gossip signal used for this purpose, with the remainder of the functionality to be added in a follow-up commit. The signal is gossip on the IO admission control adjustment schedule, i.e. every 15s (with twice that as the TTL). The signal may also, in the future, be used by the distribution layer to further improve on our data and load placement algorithms. Touches cockroachdb#79215. Touches cockroachdb#77604. Touches cockroachdb#82611. Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Jul 5, 2022
The plan for cockroachdb#79215 is to have stores not send to followers that are experiencing IO overload, unless this is necessary for quorum. This commit teaches admission control to relay the current status of the IO overload signals to the Store, which in turn gossips it. We add a type `IOThreshold` that encapsulates the input signals to I/O admission control to present them as a unified score, which hopefully leads to less ad-hoc use of the underlying signals. Together, this paves the way for using the signal to inform distribution decisions (cockroachdb#83490) and (bluntly) shape raft traffic (cockroachdb#79215). Touches cockroachdb#79215. Touches cockroachdb#77604. Touches cockroachdb#82611. Touches cockroachdb#83490. Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Jul 5, 2022
The plan for cockroachdb#79215 is to have stores not send to followers that are experiencing IO overload, unless this is necessary for quorum. This commit teaches admission control to relay the current status of the IO overload signals to the Store, which in turn gossips it. We add a type `IOThreshold` that encapsulates the input signals to I/O admission control to present them as a unified score, which hopefully leads to less ad-hoc use of the underlying signals. Together, this paves the way for using the signal to inform distribution decisions (cockroachdb#83490) and (bluntly) shape raft traffic (cockroachdb#79215). Touches cockroachdb#79215. Touches cockroachdb#77604. Touches cockroachdb#82611. Touches cockroachdb#83490. Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Jul 6, 2022
The plan for cockroachdb#79215 is to have stores not send to followers that are experiencing IO overload, unless this is necessary for quorum. This commit teaches admission control to relay the current status of the IO overload signals to the Store, which in turn gossips it. We add a type `IOThreshold` that encapsulates the input signals to I/O admission control to present them as a unified score, which hopefully leads to less ad-hoc use of the underlying signals. Together, this paves the way for using the signal to inform distribution decisions (cockroachdb#83490) and (bluntly) shape raft traffic (cockroachdb#79215). Touches cockroachdb#79215. Touches cockroachdb#77604. Touches cockroachdb#82611. Touches cockroachdb#83490. Release note: None
craig bot
pushed a commit
that referenced
this issue
Jul 7, 2022
83720: admission,kvserver: broadcast per-store IO overload status r=kvoli a=tbg The plan for #79215 is to have stores not send to followers that are experiencing IO overload, unless this is necessary for quorum. This commit teaches admission control to relay the current status of the IO overload signals to the Store, which in turn gossips it. We add a type `IOThreshold` that encapsulates the input signals to I/O admission control to present them as a unified score, which hopefully leads to less ad-hoc use of the underlying signals. Together, this paves the way for using the signal to inform distribution decisions (#83490) and (bluntly) shape raft traffic (#79215). Touches #79215. Touches #77604. Touches #82611. Touches #83490. Release note: None 83732: sql: move NullableArgs function property to overload level r=chengxiong-ruan a=chengxiong-ruan Currently we only have builtins, and that all overloads of a same function name share function properties. However, we're going to support user defined functions whose properties can vary as how users define them. So we need to move any property that's relevant to UDF to overload level. Currently, `NullableArgs` is the only one matters. Release note: None. 83938: sql: add identifiers to sampled query r=THardy98 a=THardy98 Partially addresses: #71328 This change introduces identifiers into the sampled query log, namely: - Database name - Session ID - Transaction ID - Statement ID Adding transaction ID incurs an additional lock access, the difference in performance is negligible. Results after running `kv95` on a 3 node GCE cluster using roachprod: ``` setup: ./workload init kv --splits 1000 --read-percent 95 ./workload run kv --read-percent 95 --concurrency 64 --sequential --duration 30m ``` ``` master: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 1800.0s 0 29725479 16514.2 3.4 2.8 8.9 16.8 104.9 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 1800.0s 0 1564158 869.0 8.5 8.1 15.7 26.2 100.7 write _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 1800.0s 0 31289637 17383.1 3.7 2.9 10.0 17.8 104.9 ``` ``` enrich_telemetry_add_identifiers _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 1800.0s 0 29635045 16463.9 3.4 2.8 8.9 16.8 117.4 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 1800.0s 0 1561022 867.2 8.5 7.9 15.7 26.2 113.2 write _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 1800.0s 0 31196067 17331.1 3.7 2.9 10.0 17.8 117.4 ``` Release note (sql change): Sampled query telemetry log now includes session/transaction/statement IDs, and database name of the query. 83953: reducesql: reduce index STORING columns r=mgartner a=mgartner The `reduce` tool can now reduce `STORING` columns in indexes and unique constraints. Release note: None Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Chengxiong Ruan <[email protected]> Co-authored-by: Thomas Hardy <[email protected]> Co-authored-by: Marcus Gartner <[email protected]>
Closing in favor of #103320. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv
Anything in KV that doesn't belong in a more specific category.
A-kv-distribution
Relating to rebalancing and leasing.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-sre
For issues SRE opened or otherwise cares about tracking.
T-kv
KV Team
There have been several recent incidents where we have noted that despite admission control being on, it is possible for overload (CPU, IO) to occur on a store. This is partially due to follower store load not being subjected to admission control on the leaseholder store.
The store rebalancer and replicate queue in this case are unaware of the degraded state of the store, as they consider only QPS and range count respectively; in addition to existing constraint signals such as disk fullness and L0 sublevels [#78608].
The solution would be to add normalized CPU usage to be considered, in much the same way as L0 sub-levels.
This solution should be implemented in two phases:
This overload check must be both relative to the cluster average, as well as set to some threshold.
e.g.
Jira issue: CRDB-16533
The text was updated successfully, but these errors were encountered: