Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
kvserver: check L0 sub-levels on allocation
Previously, the only store health signal used as a hard allocation and rebalancing constraint was disk capacity. This patch introduces L0 sub-levels as an additional constraint, to avoid allocation and rebalancing to replicas to stores which are unhealthy, indicated by a high number of L0 sub-levels. A store's sub-level count must exceed both the (1) threshold and (2) cluster in order to be considered unhealthy. The average check ensures that a cluster full of moderately high read amplification stores is not unable to make progress, whilst still ensuring that positively skewed distributions exclude the positive tail. Simulation of the effect on candidate exclusion under different L0 sub-level distributions by using the mean as an additional check vs percentiles can be found here: https://gist.github.com/kvoli/be27efd4662e89e8918430a9c7117858 The threshold corresponds to the cluster setting `kv.allocator.L0_sublevels_threshold`, which is the number of L0 sub-levels, that when a candidate store exceeds it will be potentially excluded as a target for rebalancing, or both rebalancing and allocation of replicas. The enforcement of this threshold can be applied under 4 different levels of strictness. This is configured by the cluster setting: `kv.allocator.L0_sublevels_threshold_enforce`. The 4 levels are: `block_none`: L0 sub-levels is ignored entirely. `block_none_log`: L0 sub-levels are logged if threshold exceeded. Both states below log as above. `block_rebalance_to`: L0 sub-levels are considered when excluding stores for rebalance targets. `block_all`: L0 sub-levels are considered when excluding stores for rebalance targets and allocation targets. By default, `kv.allocator.L0_sublevels_threshold` is `20`. Which corresponds to admissions control's threshold, above which it begins limiting admission of work to a store based on store health. The default enforcement level of `kv.allocator.L0_sublevels_threshold_enforce` is `block_none_log`. resolves cockroachdb#73714 Release justification: low risk, high benefit during high read amplification scenarios where an operator may limit rebalancing to high read amplification stores, to stop fueling the flame. Release note (ops change): introduce cluster settings `kv.allocator.l0_sublevels_threshold` and `kv.allocator.L0_sublevels_threshold_enforce`, which enable excluding stores as targets for allocation and rebalancing of replicas when they have high read amplification, indicated by the number of L0 sub-levels in level 0 of the store's LSM. When both `kv.allocator.l0_sublevels_threshold` and the cluster average is exceeded, the action corresponding to `kv.allocator.l0_sublevels_threshold_enforce` is taken. `block_none` will exclude no candidate stores, `block_none_log` will exclude no candidates but log an event, `block_rebalance_to` will exclude candidates stores from being targets of rebalance actions, `block_all` will exclude candidate stores from being targets of both allocation and rebalancing. Default `kv.allocator.l0_sublevels_threshold` is set to `20` and `kv.allocator.l0_sublevels_threshold_enforce` is set to `block_none_log`.
- Loading branch information