Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: store with high read amplification should not be a target of rebalancing #73714

Closed
sumeerbhola opened this issue Dec 11, 2021 · 1 comment · Fixed by #78608 · May be fixed by #73720
Closed

kvserver: store with high read amplification should not be a target of rebalancing #73714

sumeerbhola opened this issue Dec 11, 2021 · 1 comment · Fixed by #78608 · May be fixed by #73720
Labels
A-kv-replication-constraints C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Dec 11, 2021

Looking at a 6 hour interval of a node with a single store that had consistently high read amplification of > 1600, there are 3055 log entries containing "applying snapshot of type INITIAL".
The allocator should not add replicas to a store that is unhealthy in this manner.

There are also 11804 "removing replica" log statements, so probably the allocator has some signal it is using to shed load.

(the Cockroach Labs internal link for these logs https://upload.cockroachlabs.com/receive/?thread=J0LX-VAFS&packageCode=3V8ZJ2nVEB0zprhPefmADBbjcHJ1YEnIerx9xAyeqdE#keyCode=q1HsA9iW08Ftw0eGrEXYMckv635vIQbmw8s-Y5i3FxI)

cc: @aayushshah15

Jira issue: CRDB-11703

@sumeerbhola sumeerbhola added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-replication-constraints labels Dec 11, 2021
aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue Dec 12, 2021
This commit introduces two new cluster settings:
```
kv.snapshot_decline.read_amp_threshold

server.declined_snapshot_timeout
```

With this commit, stores with a read amplification level higher than
`kv.snapshot_decline.read_amp_threshold` will decline all `REBALANCE`
snapshots. Upon receiving a `DECLINED` response, the senders of these snapshots
will consider these receivers `throttled` for
`server.declined_snapshot_timeout`.

This means that stores with poor LSM health will not be considered as valid
candidates for replica rebalancing.

Fixes cockroachdb#73714
Related to cockroachdb#62168

Release note: None
aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue Dec 13, 2021
This commit introduces two new cluster settings:
```
kv.snapshot_decline.read_amp_threshold

server.declined_snapshot.timeout
```

With this commit, stores with a read amplification level higher than
`kv.snapshot_decline.read_amp_threshold` will decline all `REBALANCE`
snapshots. Upon receiving a `DECLINED` response, the senders of these snapshots
will consider these receivers `throttled` for
`server.declined_snapshot.timeout`.

This means that stores with poor LSM health will not be considered as valid
candidates for replica rebalancing.

Fixes cockroachdb#73714
Related to cockroachdb#62168

Release note: None
@blathers-crl blathers-crl bot added the T-kv KV Team label Dec 14, 2021
aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue Mar 8, 2022
This commit introduces two new cluster settings:
```
kv.snapshot_decline.read_amp_threshold

server.declined_snapshot.timeout
```

With this commit, stores with a read amplification level higher than
`kv.snapshot_decline.read_amp_threshold` will decline all `REBALANCE`
snapshots. Upon receiving a `DECLINED` response, the senders of these snapshots
will consider these receivers `throttled` for
`server.declined_snapshot.timeout`.

This means that stores with poor LSM health will not be considered as valid
candidates for replica rebalancing.

Fixes cockroachdb#73714
Related to cockroachdb#62168

Release note: None

Release justification: This patch adds a tunable guardrail that could prevent
or mitigate cluster instability
kvoli added a commit to kvoli/cockroach that referenced this issue Mar 31, 2022
Previously, the only store health signal used as a hard allocation and
rebalancing constraint was disk capacity. This patch introduces L0
sub-levels as an additional constraint, to avoid allocation and
rebalancing to replicas to stores which are unhealthy, indicated by a
high number of L0 sub-levlels.

A store's sub-level count  must exceed both the (1) threshold and (2)
cluster in order to be considered unhealthy. The average check ensures
that a cluster full of  moderately high read amplification stores is not
unable to make progress, whilst still ensuring that positively skewed
distributions exclude the positive tail.

Simulation of the effect on candidate exclusion under different L0
sub-level distributions by using the mean as an additional check vs
percentiles can be found here:
https://gist.github.com/kvoli/be27efd4662e89e8918430a9c7117858

The threshold corresponds to the cluster setting
`kv.allocator.L0_sublevels_threshold`, which is the number of L0
sub-levels, that when a candidate store exceeds it will be potentially
excluded as a target for rebalancing, or both rebalancing and allocation
of replicas.

The enforcement of this threshold can be applied under 4 different
levels of strictness. This is configured by the cluster setting:
`kv.allocator.L0_sublevels_threshold_enforce`.

The 4 levels are:

`block_none`: L0 sub-levels is ignored entirely.
`block_none_log`: L0 sub-levels are logged if threshold exceeded.

Both states below log as above.

`block_rebalance_to`: L0 sub-levels are considered when excluding stores
for rebalance targets.
`block_all`: L0 sub-levels are considered when excluding stores for
rebalance targets and allocation targets.

By default, `kv.allocator.L0_sublevels_threshold` is `20`. Which
corresponds to admissions control's threshold, above which it begins
limiting admission of work to a store based on store health. The default
enforcement level of `kv.allocator.L0_sublevels_threshold_enforce` is
`block_none_log`.

resolves cockroachdb#73714

Release justification: low risk, high benefit during high read
amplification scenarios where an operator may limit rebalancing to high
read amplification stores, to stop fueling the flame.

Release note (ops change): introduce cluster settings
`kv.allocator.l0_sublevels_threshold` and
`kv.allocator.L0_sublevels_threshold_enforce`, which enable excluding
stores as targets for allocation and rebalancing of replicas when they
have high read amplification, indicated by the number of L0 sub-levels
in level 0 of the store's LSM. When both
`kv.allocator.l0_sublevels_threshold` and the cluster average is
exceeded, the action corresponding to
`kv.allocator.l0_sublevels_threshold_enforce` is taken. `block_none`
will exclude no candidate stores, `block_none_log` will exclude no
candidates but log an event, `block_rebalance_to` will exclude
candidates stores from being targets of rebalance actions, `block_all`
will exclude candidate stores from being targets of both allocation and
rebalancing. Default `kv.allocator.l0_sublevels_threshold` is set to
`20` and `kv.allocator.l0_sublevels_threshold_enforce` is set to
`block_none_log`.
kvoli added a commit to kvoli/cockroach that referenced this issue Apr 1, 2022
Previously, the only store health signal used as a hard allocation and
rebalancing constraint was disk capacity. This patch introduces L0
sub-levels as an additional constraint, to avoid allocation and
rebalancing to replicas to stores which are unhealthy, indicated by a
high number of L0 sub-levlels.

A store's sub-level count  must exceed both the (1) threshold and (2)
cluster in order to be considered unhealthy. The average check ensures
that a cluster full of  moderately high read amplification stores is not
unable to make progress, whilst still ensuring that positively skewed
distributions exclude the positive tail.

Simulation of the effect on candidate exclusion under different L0
sub-level distributions by using the mean as an additional check vs
percentiles can be found here:
https://gist.github.com/kvoli/be27efd4662e89e8918430a9c7117858

The threshold corresponds to the cluster setting
`kv.allocator.L0_sublevels_threshold`, which is the number of L0
sub-levels, that when a candidate store exceeds it will be potentially
excluded as a target for rebalancing, or both rebalancing and allocation
of replicas.

The enforcement of this threshold can be applied under 4 different
levels of strictness. This is configured by the cluster setting:
`kv.allocator.L0_sublevels_threshold_enforce`.

The 4 levels are:

`block_none`: L0 sub-levels is ignored entirely.
`block_none_log`: L0 sub-levels are logged if threshold exceeded.

Both states below log as above.

`block_rebalance_to`: L0 sub-levels are considered when excluding stores
for rebalance targets.
`block_all`: L0 sub-levels are considered when excluding stores for
rebalance targets and allocation targets.

By default, `kv.allocator.L0_sublevels_threshold` is `20`. Which
corresponds to admissions control's threshold, above which it begins
limiting admission of work to a store based on store health. The default
enforcement level of `kv.allocator.L0_sublevels_threshold_enforce` is
`block_none_log`.

resolves cockroachdb#73714

Release justification: low risk, high benefit during high read
amplification scenarios where an operator may limit rebalancing to high
read amplification stores, to stop fueling the flame.

Release note (ops change): introduce cluster settings
`kv.allocator.l0_sublevels_threshold` and
`kv.allocator.L0_sublevels_threshold_enforce`, which enable excluding
stores as targets for allocation and rebalancing of replicas when they
have high read amplification, indicated by the number of L0 sub-levels
in level 0 of the store's LSM. When both
`kv.allocator.l0_sublevels_threshold` and the cluster average is
exceeded, the action corresponding to
`kv.allocator.l0_sublevels_threshold_enforce` is taken. `block_none`
will exclude no candidate stores, `block_none_log` will exclude no
candidates but log an event, `block_rebalance_to` will exclude
candidates stores from being targets of rebalance actions, `block_all`
will exclude candidate stores from being targets of both allocation and
rebalancing. Default `kv.allocator.l0_sublevels_threshold` is set to
`20` and `kv.allocator.l0_sublevels_threshold_enforce` is set to
`block_none_log`.
kvoli added a commit to kvoli/cockroach that referenced this issue Apr 4, 2022
Previously, the only store health signal used as a hard allocation and
rebalancing constraint was disk capacity. This patch introduces L0
sub-levels as an additional constraint, to avoid allocation and
rebalancing to replicas to stores which are unhealthy, indicated by a
high number of L0 sub-levlels.

A store's sub-level count  must exceed both the (1) threshold and (2)
cluster in order to be considered unhealthy. The average check ensures
that a cluster full of  moderately high read amplification stores is not
unable to make progress, whilst still ensuring that positively skewed
distributions exclude the positive tail.

Simulation of the effect on candidate exclusion under different L0
sub-level distributions by using the mean as an additional check vs
percentiles can be found here:
https://gist.github.com/kvoli/be27efd4662e89e8918430a9c7117858

The threshold corresponds to the cluster setting
`kv.allocator.L0_sublevels_threshold`, which is the number of L0
sub-levels, that when a candidate store exceeds it will be potentially
excluded as a target for rebalancing, or both rebalancing and allocation
of replicas.

The enforcement of this threshold can be applied under 4 different
levels of strictness. This is configured by the cluster setting:
`kv.allocator.L0_sublevels_threshold_enforce`.

The 4 levels are:

`block_none`: L0 sub-levels is ignored entirely.
`block_none_log`: L0 sub-levels are logged if threshold exceeded.

Both states below log as above.

`block_rebalance_to`: L0 sub-levels are considered when excluding stores
for rebalance targets.
`block_all`: L0 sub-levels are considered when excluding stores for
rebalance targets and allocation targets.

By default, `kv.allocator.L0_sublevels_threshold` is `20`. Which
corresponds to admissions control's threshold, above which it begins
limiting admission of work to a store based on store health. The default
enforcement level of `kv.allocator.L0_sublevels_threshold_enforce` is
`block_none_log`.

resolves cockroachdb#73714

Release justification: low risk, high benefit during high read
amplification scenarios where an operator may limit rebalancing to high
read amplification stores, to stop fueling the flame.

Release note (ops change): introduce cluster settings
`kv.allocator.l0_sublevels_threshold` and
`kv.allocator.L0_sublevels_threshold_enforce`, which enable excluding
stores as targets for allocation and rebalancing of replicas when they
have high read amplification, indicated by the number of L0 sub-levels
in level 0 of the store's LSM. When both
`kv.allocator.l0_sublevels_threshold` and the cluster average is
exceeded, the action corresponding to
`kv.allocator.l0_sublevels_threshold_enforce` is taken. `block_none`
will exclude no candidate stores, `block_none_log` will exclude no
candidates but log an event, `block_rebalance_to` will exclude
candidates stores from being targets of rebalance actions, `block_all`
will exclude candidate stores from being targets of both allocation and
rebalancing. Default `kv.allocator.l0_sublevels_threshold` is set to
`20` and `kv.allocator.l0_sublevels_threshold_enforce` is set to
`block_none_log`.
kvoli added a commit to kvoli/cockroach that referenced this issue Apr 4, 2022
Previously, the only store health signal used as a hard allocation and
rebalancing constraint was disk capacity. This patch introduces L0
sub-levels as an additional constraint, to avoid allocation and
rebalancing to replicas to stores which are unhealthy, indicated by a
high number of L0 sub-levlels.

A store's sub-level count  must exceed both the (1) threshold and (2)
cluster in order to be considered unhealthy. The average check ensures
that a cluster full of  moderately high read amplification stores is not
unable to make progress, whilst still ensuring that positively skewed
distributions exclude the positive tail.

Simulation of the effect on candidate exclusion under different L0
sub-level distributions by using the mean as an additional check vs
percentiles can be found here:
https://gist.github.com/kvoli/be27efd4662e89e8918430a9c7117858

The threshold corresponds to the cluster setting
`kv.allocator.L0_sublevels_threshold`, which is the number of L0
sub-levels, that when a candidate store exceeds it will be potentially
excluded as a target for rebalancing, or both rebalancing and allocation
of replicas.

The enforcement of this threshold can be applied under 4 different
levels of strictness. This is configured by the cluster setting:
`kv.allocator.L0_sublevels_threshold_enforce`.

The 4 levels are:

`block_none`: L0 sub-levels is ignored entirely.
`block_none_log`: L0 sub-levels are logged if threshold exceeded.

Both states below log as above.

`block_rebalance_to`: L0 sub-levels are considered when excluding stores
for rebalance targets.
`block_all`: L0 sub-levels are considered when excluding stores for
rebalance targets and allocation targets.

By default, `kv.allocator.L0_sublevels_threshold` is `20`. Which
corresponds to admissions control's threshold, above which it begins
limiting admission of work to a store based on store health. The default
enforcement level of `kv.allocator.L0_sublevels_threshold_enforce` is
`block_none_log`.

resolves cockroachdb#73714

Release justification: low risk, high benefit during high read
amplification scenarios where an operator may limit rebalancing to high
read amplification stores, to stop fueling the flame.

Release note (ops change): introduce cluster settings
`kv.allocator.l0_sublevels_threshold` and
`kv.allocator.L0_sublevels_threshold_enforce`, which enable excluding
stores as targets for allocation and rebalancing of replicas when they
have high read amplification, indicated by the number of L0 sub-levels
in level 0 of the store's LSM. When both
`kv.allocator.l0_sublevels_threshold` and the cluster average is
exceeded, the action corresponding to
`kv.allocator.l0_sublevels_threshold_enforce` is taken. `block_none`
will exclude no candidate stores, `block_none_log` will exclude no
candidates but log an event, `block_rebalance_to` will exclude
candidates stores from being targets of rebalance actions, `block_all`
will exclude candidate stores from being targets of both allocation and
rebalancing. Default `kv.allocator.l0_sublevels_threshold` is set to
`20` and `kv.allocator.l0_sublevels_threshold_enforce` is set to
`block_none_log`.
kvoli added a commit to kvoli/cockroach that referenced this issue Apr 4, 2022
Previously, the only store health signal used as a hard allocation and
rebalancing constraint was disk capacity. This patch introduces L0
sub-levels as an additional constraint, to avoid allocation and
rebalancing to replicas to stores which are unhealthy, indicated by a
high number of L0 sub-levlels.

A store's sub-level count  must exceed both the (1) threshold and (2)
cluster in order to be considered unhealthy. The average check ensures
that a cluster full of  moderately high read amplification stores is not
unable to make progress, whilst still ensuring that positively skewed
distributions exclude the positive tail.

Simulation of the effect on candidate exclusion under different L0
sub-level distributions by using the mean as an additional check vs
percentiles can be found here:
https://gist.github.com/kvoli/be27efd4662e89e8918430a9c7117858

The threshold corresponds to the cluster setting
`kv.allocator.L0_sublevels_threshold`, which is the number of L0
sub-levels, that when a candidate store exceeds it will be potentially
excluded as a target for rebalancing, or both rebalancing and allocation
of replicas.

The enforcement of this threshold can be applied under 4 different
levels of strictness. This is configured by the cluster setting:
`kv.allocator.L0_sublevels_threshold_enforce`.

The 4 levels are:

`block_none`: L0 sub-levels is ignored entirely.
`block_none_log`: L0 sub-levels are logged if threshold exceeded.

Both states below log as above.

`block_rebalance_to`: L0 sub-levels are considered when excluding stores
for rebalance targets.
`block_all`: L0 sub-levels are considered when excluding stores for
rebalance targets and allocation targets.

By default, `kv.allocator.L0_sublevels_threshold` is `20`. Which
corresponds to admissions control's threshold, above which it begins
limiting admission of work to a store based on store health. The default
enforcement level of `kv.allocator.L0_sublevels_threshold_enforce` is
`block_none_log`.

resolves cockroachdb#73714

Release justification: low risk, high benefit during high read
amplification scenarios where an operator may limit rebalancing to high
read amplification stores, to stop fueling the flame.

Release note (ops change): introduce cluster settings
`kv.allocator.l0_sublevels_threshold` and
`kv.allocator.L0_sublevels_threshold_enforce`, which enable excluding
stores as targets for allocation and rebalancing of replicas when they
have high read amplification, indicated by the number of L0 sub-levels
in level 0 of the store's LSM. When both
`kv.allocator.l0_sublevels_threshold` and the cluster average is
exceeded, the action corresponding to
`kv.allocator.l0_sublevels_threshold_enforce` is taken. `block_none`
will exclude no candidate stores, `block_none_log` will exclude no
candidates but log an event, `block_rebalance_to` will exclude
candidates stores from being targets of rebalance actions, `block_all`
will exclude candidate stores from being targets of both allocation and
rebalancing. Default `kv.allocator.l0_sublevels_threshold` is set to
`20` and `kv.allocator.l0_sublevels_threshold_enforce` is set to
`block_none_log`.
kvoli added a commit to kvoli/cockroach that referenced this issue Apr 5, 2022
Previously, the only store health signal used as a hard allocation and
rebalancing constraint was disk capacity. This patch introduces L0
sub-levels as an additional constraint, to avoid allocation and
rebalancing to replicas to stores which are unhealthy, indicated by a
high number of L0 sub-levels.

A store's sub-level count  must exceed both the (1) threshold and (2)
cluster in order to be considered unhealthy. The average check ensures
that a cluster full of  moderately high read amplification stores is not
unable to make progress, whilst still ensuring that positively skewed
distributions exclude the positive tail.

Simulation of the effect on candidate exclusion under different L0
sub-level distributions by using the mean as an additional check vs
percentiles can be found here:
https://gist.github.com/kvoli/be27efd4662e89e8918430a9c7117858

The threshold corresponds to the cluster setting
`kv.allocator.L0_sublevels_threshold`, which is the number of L0
sub-levels, that when a candidate store exceeds it will be potentially
excluded as a target for rebalancing, or both rebalancing and allocation
of replicas.

The enforcement of this threshold can be applied under 4 different
levels of strictness. This is configured by the cluster setting:
`kv.allocator.L0_sublevels_threshold_enforce`.

The 4 levels are:

`block_none`: L0 sub-levels is ignored entirely.
`block_none_log`: L0 sub-levels are logged if threshold exceeded.

Both states below log as above.

`block_rebalance_to`: L0 sub-levels are considered when excluding stores
for rebalance targets.
`block_all`: L0 sub-levels are considered when excluding stores for
rebalance targets and allocation targets.

By default, `kv.allocator.L0_sublevels_threshold` is `20`. Which
corresponds to admissions control's threshold, above which it begins
limiting admission of work to a store based on store health. The default
enforcement level of `kv.allocator.L0_sublevels_threshold_enforce` is
`block_none_log`.

resolves cockroachdb#73714

Release justification: low risk, high benefit during high read
amplification scenarios where an operator may limit rebalancing to high
read amplification stores, to stop fueling the flame.

Release note (ops change): introduce cluster settings
`kv.allocator.l0_sublevels_threshold` and
`kv.allocator.L0_sublevels_threshold_enforce`, which enable excluding
stores as targets for allocation and rebalancing of replicas when they
have high read amplification, indicated by the number of L0 sub-levels
in level 0 of the store's LSM. When both
`kv.allocator.l0_sublevels_threshold` and the cluster average is
exceeded, the action corresponding to
`kv.allocator.l0_sublevels_threshold_enforce` is taken. `block_none`
will exclude no candidate stores, `block_none_log` will exclude no
candidates but log an event, `block_rebalance_to` will exclude
candidates stores from being targets of rebalance actions, `block_all`
will exclude candidate stores from being targets of both allocation and
rebalancing. Default `kv.allocator.l0_sublevels_threshold` is set to
`20` and `kv.allocator.l0_sublevels_threshold_enforce` is set to
`block_none_log`.
kvoli added a commit to kvoli/cockroach that referenced this issue Apr 5, 2022
Previously, the only store health signal used as a hard allocation and
rebalancing constraint was disk capacity. This patch introduces L0
sub-levels as an additional constraint, to avoid allocation and
rebalancing to replicas to stores which are unhealthy, indicated by a
high number of L0 sub-levels.

A store's sub-level count  must exceed both the (1) threshold and (2)
cluster in order to be considered unhealthy. The average check ensures
that a cluster full of  moderately high read amplification stores is not
unable to make progress, whilst still ensuring that positively skewed
distributions exclude the positive tail.

Simulation of the effect on candidate exclusion under different L0
sub-level distributions by using the mean as an additional check vs
percentiles can be found here:
https://gist.github.com/kvoli/be27efd4662e89e8918430a9c7117858

The threshold corresponds to the cluster setting
`kv.allocator.L0_sublevels_threshold`, which is the number of L0
sub-levels, that when a candidate store exceeds it will be potentially
excluded as a target for rebalancing, or both rebalancing and allocation
of replicas.

The enforcement of this threshold can be applied under 4 different
levels of strictness. This is configured by the cluster setting:
`kv.allocator.L0_sublevels_threshold_enforce`.

The 4 levels are:

`block_none`: L0 sub-levels is ignored entirely.
`block_none_log`: L0 sub-levels are logged if threshold exceeded.

Both states below log as above.

`block_rebalance_to`: L0 sub-levels are considered when excluding stores
for rebalance targets.
`block_all`: L0 sub-levels are considered when excluding stores for
rebalance targets and allocation targets.

By default, `kv.allocator.L0_sublevels_threshold` is `20`. Which
corresponds to admissions control's threshold, above which it begins
limiting admission of work to a store based on store health. The default
enforcement level of `kv.allocator.L0_sublevels_threshold_enforce` is
`block_none_log`.

resolves cockroachdb#73714

Release justification: low risk, high benefit during high read
amplification scenarios where an operator may limit rebalancing to high
read amplification stores, to stop fueling the flame.

Release note (ops change): introduce cluster settings
`kv.allocator.l0_sublevels_threshold` and
`kv.allocator.L0_sublevels_threshold_enforce`, which enable excluding
stores as targets for allocation and rebalancing of replicas when they
have high read amplification, indicated by the number of L0 sub-levels
in level 0 of the store's LSM. When both
`kv.allocator.l0_sublevels_threshold` and the cluster average is
exceeded, the action corresponding to
`kv.allocator.l0_sublevels_threshold_enforce` is taken. `block_none`
will exclude no candidate stores, `block_none_log` will exclude no
candidates but log an event, `block_rebalance_to` will exclude
candidates stores from being targets of rebalance actions, `block_all`
will exclude candidate stores from being targets of both allocation and
rebalancing. Default `kv.allocator.l0_sublevels_threshold` is set to
`20` and `kv.allocator.l0_sublevels_threshold_enforce` is set to
`block_none_log`.
kvoli added a commit to kvoli/cockroach that referenced this issue Apr 6, 2022
Previously, the only store health signal used as a hard allocation and
rebalancing constraint was disk capacity. This patch introduces L0
sub-levels as an additional constraint, to avoid allocation and
rebalancing to replicas to stores which are unhealthy, indicated by a
high number of L0 sub-levels.

A store's sub-level count  must exceed both the (1) threshold and (2)
cluster in order to be considered unhealthy. The average check ensures
that a cluster full of  moderately high read amplification stores is not
unable to make progress, whilst still ensuring that positively skewed
distributions exclude the positive tail.

Simulation of the effect on candidate exclusion under different L0
sub-level distributions by using the mean as an additional check vs
percentiles can be found here:
https://gist.github.com/kvoli/be27efd4662e89e8918430a9c7117858

The threshold corresponds to the cluster setting
`kv.allocator.L0_sublevels_threshold`, which is the number of L0
sub-levels, that when a candidate store exceeds it will be potentially
excluded as a target for rebalancing, or both rebalancing and allocation
of replicas.

The enforcement of this threshold can be applied under 4 different
levels of strictness. This is configured by the cluster setting:
`kv.allocator.L0_sublevels_threshold_enforce`.

The 4 levels are:

`block_none`: L0 sub-levels is ignored entirely.
`block_none_log`: L0 sub-levels are logged if threshold exceeded.

Both states below log as above.

`block_rebalance_to`: L0 sub-levels are considered when excluding stores
for rebalance targets.
`block_all`: L0 sub-levels are considered when excluding stores for
rebalance targets and allocation targets.

By default, `kv.allocator.L0_sublevels_threshold` is `20`. Which
corresponds to admissions control's threshold, above which it begins
limiting admission of work to a store based on store health. The default
enforcement level of `kv.allocator.L0_sublevels_threshold_enforce` is
`block_none_log`.

resolves cockroachdb#73714

Release justification: low risk, high benefit during high read
amplification scenarios where an operator may limit rebalancing to high
read amplification stores, to stop fueling the flame.

Release note (ops change): introduce cluster settings
`kv.allocator.l0_sublevels_threshold` and
`kv.allocator.L0_sublevels_threshold_enforce`, which enable excluding
stores as targets for allocation and rebalancing of replicas when they
have high read amplification, indicated by the number of L0 sub-levels
in level 0 of the store's LSM. When both
`kv.allocator.l0_sublevels_threshold` and the cluster average is
exceeded, the action corresponding to
`kv.allocator.l0_sublevels_threshold_enforce` is taken. `block_none`
will exclude no candidate stores, `block_none_log` will exclude no
candidates but log an event, `block_rebalance_to` will exclude
candidates stores from being targets of rebalance actions, `block_all`
will exclude candidate stores from being targets of both allocation and
rebalancing. Default `kv.allocator.l0_sublevels_threshold` is set to
`20` and `kv.allocator.l0_sublevels_threshold_enforce` is set to
`block_none_log`.
kvoli added a commit to kvoli/cockroach that referenced this issue Apr 7, 2022
Previously, the only store health signal used as a hard allocation and
rebalancing constraint was disk capacity. This patch introduces L0
sub-levels as an additional constraint, to avoid allocation and
rebalancing to replicas to stores which are unhealthy, indicated by a
high number of L0 sub-levels.

A store's sub-level count  must exceed both the (1) threshold and (2)
cluster in order to be considered unhealthy. The average check ensures
that a cluster full of  moderately high read amplification stores is not
unable to make progress, whilst still ensuring that positively skewed
distributions exclude the positive tail.

Simulation of the effect on candidate exclusion under different L0
sub-level distributions by using the mean as an additional check vs
percentiles can be found here:
https://gist.github.com/kvoli/be27efd4662e89e8918430a9c7117858

The threshold corresponds to the cluster setting
`kv.allocator.L0_sublevels_threshold`, which is the number of L0
sub-levels, that when a candidate store exceeds it will be potentially
excluded as a target for rebalancing, or both rebalancing and allocation
of replicas.

The enforcement of this threshold can be applied under 4 different
levels of strictness. This is configured by the cluster setting:
`kv.allocator.L0_sublevels_threshold_enforce`.

The 4 levels are:

`block_none`: L0 sub-levels is ignored entirely.
`block_none_log`: L0 sub-levels are logged if threshold exceeded.

Both states below log as above.

`block_rebalance_to`: L0 sub-levels are considered when excluding stores
for rebalance targets.
`block_all`: L0 sub-levels are considered when excluding stores for
rebalance targets and allocation targets.

By default, `kv.allocator.L0_sublevels_threshold` is `20`. Which
corresponds to admissions control's threshold, above which it begins
limiting admission of work to a store based on store health. The default
enforcement level of `kv.allocator.L0_sublevels_threshold_enforce` is
`block_none_log`.

resolves cockroachdb#73714

Release justification: low risk, high benefit during high read
amplification scenarios where an operator may limit rebalancing to high
read amplification stores, to stop fueling the flame.

Release note (ops change): introduce cluster settings
`kv.allocator.l0_sublevels_threshold` and
`kv.allocator.L0_sublevels_threshold_enforce`, which enable excluding
stores as targets for allocation and rebalancing of replicas when they
have high read amplification, indicated by the number of L0 sub-levels
in level 0 of the store's LSM. When both
`kv.allocator.l0_sublevels_threshold` and the cluster average is
exceeded, the action corresponding to
`kv.allocator.l0_sublevels_threshold_enforce` is taken. `block_none`
will exclude no candidate stores, `block_none_log` will exclude no
candidates but log an event, `block_rebalance_to` will exclude
candidates stores from being targets of rebalance actions, `block_all`
will exclude candidate stores from being targets of both allocation and
rebalancing. Default `kv.allocator.l0_sublevels_threshold` is set to
`20` and `kv.allocator.l0_sublevels_threshold_enforce` is set to
`block_none_log`.
kvoli added a commit to kvoli/cockroach that referenced this issue Apr 8, 2022
Previously, the only store health signal used as a hard allocation and
rebalancing constraint was disk capacity. This patch introduces L0
sub-levels as an additional constraint, to avoid allocation and
rebalancing to replicas to stores which are unhealthy, indicated by a
high number of L0 sub-levels.

A store's sub-level count  must exceed both the (1) threshold and (2)
cluster in order to be considered unhealthy. The average check ensures
that a cluster full of  moderately high read amplification stores is not
unable to make progress, whilst still ensuring that positively skewed
distributions exclude the positive tail.

Simulation of the effect on candidate exclusion under different L0
sub-level distributions by using the mean as an additional check vs
percentiles can be found here:
https://gist.github.com/kvoli/be27efd4662e89e8918430a9c7117858

The threshold corresponds to the cluster setting
`kv.allocator.L0_sublevels_threshold`, which is the number of L0
sub-levels, that when a candidate store exceeds it will be potentially
excluded as a target for rebalancing, or both rebalancing and allocation
of replicas.

The enforcement of this threshold can be applied under 4 different
levels of strictness. This is configured by the cluster setting:
`kv.allocator.L0_sublevels_threshold_enforce`.

The 4 levels are:

`block_none`: L0 sub-levels is ignored entirely.
`block_none_log`: L0 sub-levels are logged if threshold exceeded.

Both states below log as above.

`block_rebalance_to`: L0 sub-levels are considered when excluding stores
for rebalance targets.
`block_all`: L0 sub-levels are considered when excluding stores for
rebalance targets and allocation targets.

By default, `kv.allocator.L0_sublevels_threshold` is `20`. Which
corresponds to admissions control's threshold, above which it begins
limiting admission of work to a store based on store health. The default
enforcement level of `kv.allocator.L0_sublevels_threshold_enforce` is
`block_none_log`.

resolves cockroachdb#73714

Release justification: low risk, high benefit during high read
amplification scenarios where an operator may limit rebalancing to high
read amplification stores, to stop fueling the flame.

Release note (ops change): introduce cluster settings
`kv.allocator.l0_sublevels_threshold` and
`kv.allocator.L0_sublevels_threshold_enforce`, which enable excluding
stores as targets for allocation and rebalancing of replicas when they
have high read amplification, indicated by the number of L0 sub-levels
in level 0 of the store's LSM. When both
`kv.allocator.l0_sublevels_threshold` and the cluster average is
exceeded, the action corresponding to
`kv.allocator.l0_sublevels_threshold_enforce` is taken. `block_none`
will exclude no candidate stores, `block_none_log` will exclude no
candidates but log an event, `block_rebalance_to` will exclude
candidates stores from being targets of rebalance actions, `block_all`
will exclude candidate stores from being targets of both allocation and
rebalancing. Default `kv.allocator.l0_sublevels_threshold` is set to
`20` and `kv.allocator.l0_sublevels_threshold_enforce` is set to
`block_none_log`.
kvoli added a commit to kvoli/cockroach that referenced this issue Apr 8, 2022
Previously, the only store health signal used as a hard allocation and
rebalancing constraint was disk capacity. This patch introduces L0
sub-levels as an additional constraint, to avoid allocation and
rebalancing to replicas to stores which are unhealthy, indicated by a
high number of L0 sub-levels.

A store's sub-level count  must exceed both the (1) threshold and (2)
cluster in order to be considered unhealthy. The average check ensures
that a cluster full of  moderately high read amplification stores is not
unable to make progress, whilst still ensuring that positively skewed
distributions exclude the positive tail.

Simulation of the effect on candidate exclusion under different L0
sub-level distributions by using the mean as an additional check vs
percentiles can be found here:
https://gist.github.com/kvoli/be27efd4662e89e8918430a9c7117858

The threshold corresponds to the cluster setting
`kv.allocator.L0_sublevels_threshold`, which is the number of L0
sub-levels, that when a candidate store exceeds it will be potentially
excluded as a target for rebalancing, or both rebalancing and allocation
of replicas.

The enforcement of this threshold can be applied under 4 different
levels of strictness. This is configured by the cluster setting:
`kv.allocator.L0_sublevels_threshold_enforce`.

The 4 levels are:

`block_none`: L0 sub-levels is ignored entirely.
`block_none_log`: L0 sub-levels are logged if threshold exceeded.

Both states below log as above.

`block_rebalance_to`: L0 sub-levels are considered when excluding stores
for rebalance targets.
`block_all`: L0 sub-levels are considered when excluding stores for
rebalance targets and allocation targets.

By default, `kv.allocator.L0_sublevels_threshold` is `20`. Which
corresponds to admissions control's threshold, above which it begins
limiting admission of work to a store based on store health. The default
enforcement level of `kv.allocator.L0_sublevels_threshold_enforce` is
`block_none_log`.

resolves cockroachdb#73714

Release justification: low risk, high benefit during high read
amplification scenarios where an operator may limit rebalancing to high
read amplification stores, to stop fueling the flame.

Release note (ops change): introduce cluster settings
`kv.allocator.l0_sublevels_threshold` and
`kv.allocator.L0_sublevels_threshold_enforce`, which enable excluding
stores as targets for allocation and rebalancing of replicas when they
have high read amplification, indicated by the number of L0 sub-levels
in level 0 of the store's LSM. When both
`kv.allocator.l0_sublevels_threshold` and the cluster average is
exceeded, the action corresponding to
`kv.allocator.l0_sublevels_threshold_enforce` is taken. `block_none`
will exclude no candidate stores, `block_none_log` will exclude no
candidates but log an event, `block_rebalance_to` will exclude
candidates stores from being targets of rebalance actions, `block_all`
will exclude candidate stores from being targets of both allocation and
rebalancing. Default `kv.allocator.l0_sublevels_threshold` is set to
`20` and `kv.allocator.l0_sublevels_threshold_enforce` is set to
`block_none_log`.
kvoli added a commit to kvoli/cockroach that referenced this issue Apr 9, 2022
Previously, the only store health signal used as a hard allocation and
rebalancing constraint was disk capacity. This patch introduces L0
sub-levels as an additional constraint, to avoid allocation and
rebalancing to replicas to stores which are unhealthy, indicated by a
high number of L0 sub-levels.

A store's sub-level count  must exceed both the (1) threshold and (2)
cluster in order to be considered unhealthy. The average check ensures
that a cluster full of  moderately high read amplification stores is not
unable to make progress, whilst still ensuring that positively skewed
distributions exclude the positive tail.

Simulation of the effect on candidate exclusion under different L0
sub-level distributions by using the mean as an additional check vs
percentiles can be found here:
https://gist.github.com/kvoli/be27efd4662e89e8918430a9c7117858

The threshold corresponds to the cluster setting
`kv.allocator.L0_sublevels_threshold`, which is the number of L0
sub-levels, that when a candidate store exceeds it will be potentially
excluded as a target for rebalancing, or both rebalancing and allocation
of replicas.

The enforcement of this threshold can be applied under 4 different
levels of strictness. This is configured by the cluster setting:
`kv.allocator.L0_sublevels_threshold_enforce`.

The 4 levels are:

`block_none`: L0 sub-levels is ignored entirely.
`block_none_log`: L0 sub-levels are logged if threshold exceeded.

Both states below log as above.

`block_rebalance_to`: L0 sub-levels are considered when excluding stores
for rebalance targets.
`block_all`: L0 sub-levels are considered when excluding stores for
rebalance targets and allocation targets.

By default, `kv.allocator.L0_sublevels_threshold` is `20`. Which
corresponds to admissions control's threshold, above which it begins
limiting admission of work to a store based on store health. The default
enforcement level of `kv.allocator.L0_sublevels_threshold_enforce` is
`block_none_log`.

resolves cockroachdb#73714

Release justification: low risk, high benefit during high read
amplification scenarios where an operator may limit rebalancing to high
read amplification stores, to stop fueling the flame.

Release note (ops change): introduce cluster settings
`kv.allocator.l0_sublevels_threshold` and
`kv.allocator.L0_sublevels_threshold_enforce`, which enable excluding
stores as targets for allocation and rebalancing of replicas when they
have high read amplification, indicated by the number of L0 sub-levels
in level 0 of the store's LSM. When both
`kv.allocator.l0_sublevels_threshold` and the cluster average is
exceeded, the action corresponding to
`kv.allocator.l0_sublevels_threshold_enforce` is taken. `block_none`
will exclude no candidate stores, `block_none_log` will exclude no
candidates but log an event, `block_rebalance_to` will exclude
candidates stores from being targets of rebalance actions, `block_all`
will exclude candidate stores from being targets of both allocation and
rebalancing. Default `kv.allocator.l0_sublevels_threshold` is set to
`20` and `kv.allocator.l0_sublevels_threshold_enforce` is set to
`block_none_log`.
@craig craig bot closed this as completed in edcefc2 Apr 11, 2022
@cockroachdb cockroachdb deleted a comment from mari-crl May 26, 2022
@kenliu-crl
Copy link
Contributor

manually reviewed and updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication-constraints C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
2 participants