Skip to content

Commit

Permalink
kvserver: check L0 sub-levels on allocation
Browse files Browse the repository at this point in the history
Previously, the only store health signal used as a hard allocation and
rebalancing constraint was disk capacity. This patch introduces L0
sub-levels as an additional constraint, to avoid allocation and
rebalancing to replicas to stores which are unhealthy, indicated by a
high number of L0 sub-levels.

A store's sub-level count  must exceed both the (1) threshold and (2)
cluster in order to be considered unhealthy. The average check ensures
that a cluster full of  moderately high read amplification stores is not
unable to make progress, whilst still ensuring that positively skewed
distributions exclude the positive tail.

Simulation of the effect on candidate exclusion under different L0
sub-level distributions by using the mean as an additional check vs
percentiles can be found here:
https://gist.github.com/kvoli/be27efd4662e89e8918430a9c7117858

The threshold corresponds to the cluster setting
`kv.allocator.L0_sublevels_threshold`, which is the number of L0
sub-levels, that when a candidate store exceeds it will be potentially
excluded as a target for rebalancing, or both rebalancing and allocation
of replicas.

The enforcement of this threshold can be applied under 4 different
levels of strictness. This is configured by the cluster setting:
`kv.allocator.L0_sublevels_threshold_enforce`.

The 4 levels are:

`block_none`: L0 sub-levels is ignored entirely.
`block_none_log`: L0 sub-levels are logged if threshold exceeded.

Both states below log as above.

`block_rebalance_to`: L0 sub-levels are considered when excluding stores
for rebalance targets.
`block_all`: L0 sub-levels are considered when excluding stores for
rebalance targets and allocation targets.

By default, `kv.allocator.L0_sublevels_threshold` is `20`. Which
corresponds to admissions control's threshold, above which it begins
limiting admission of work to a store based on store health. The default
enforcement level of `kv.allocator.L0_sublevels_threshold_enforce` is
`block_none_log`.

resolves cockroachdb#73714

Release justification: low risk, high benefit during high read
amplification scenarios where an operator may limit rebalancing to high
read amplification stores, to stop fueling the flame.

Release note (ops change): introduce cluster settings
`kv.allocator.l0_sublevels_threshold` and
`kv.allocator.L0_sublevels_threshold_enforce`, which enable excluding
stores as targets for allocation and rebalancing of replicas when they
have high read amplification, indicated by the number of L0 sub-levels
in level 0 of the store's LSM. When both
`kv.allocator.l0_sublevels_threshold` and the cluster average is
exceeded, the action corresponding to
`kv.allocator.l0_sublevels_threshold_enforce` is taken. `block_none`
will exclude no candidate stores, `block_none_log` will exclude no
candidates but log an event, `block_rebalance_to` will exclude
candidates stores from being targets of rebalance actions, `block_all`
will exclude candidate stores from being targets of both allocation and
rebalancing. Default `kv.allocator.l0_sublevels_threshold` is set to
`20` and `kv.allocator.l0_sublevels_threshold_enforce` is set to
`block_none_log`.
  • Loading branch information
kvoli committed Apr 9, 2022
1 parent 5cf9811 commit 83deaaa
Show file tree
Hide file tree
Showing 13 changed files with 1,122 additions and 128 deletions.
35 changes: 30 additions & 5 deletions pkg/kv/kvserver/allocator.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import (
"strings"
"time"

"github.com/cockroachdb/cockroach/pkg/clusterversion"
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/constraint"
"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/settings"
Expand Down Expand Up @@ -899,7 +900,7 @@ func (a *Allocator) allocateTarget(
conf,
existingVoters,
existingNonVoters,
a.scorerOptions(),
a.scorerOptions(ctx),
// When allocating a *new* replica, we explicitly disregard nodes with any
// existing replicas. This is important for multi-store scenarios as
// otherwise, stores on the nodes that have existing replicas are simply
Expand Down Expand Up @@ -1122,6 +1123,7 @@ func (a Allocator) removeTarget(

replicaSetForDiversityCalc := getReplicasForDiversityCalc(targetType, existingVoters, existingReplicas)
rankedCandidates := candidateListForRemoval(
ctx,
candidateStoreList,
constraintsChecker,
a.storePool.getLocalitiesByStore(replicaSetForDiversityCalc),
Expand Down Expand Up @@ -1451,16 +1453,18 @@ func (a Allocator) RebalanceNonVoter(
)
}

func (a *Allocator) scorerOptions() *rangeCountScorerOptions {
func (a *Allocator) scorerOptions(ctx context.Context) *rangeCountScorerOptions {
return &rangeCountScorerOptions{
storeHealthOptions: a.storeHealthOptions(ctx),
deterministic: a.storePool.deterministic,
rangeRebalanceThreshold: rangeRebalanceThreshold.Get(&a.storePool.st.SV),
}
}

func (a *Allocator) scorerOptionsForScatter() *scatterScorerOptions {
func (a *Allocator) scorerOptionsForScatter(ctx context.Context) *scatterScorerOptions {
return &scatterScorerOptions{
rangeCountScorerOptions: rangeCountScorerOptions{
storeHealthOptions: a.storeHealthOptions(ctx),
deterministic: a.storePool.deterministic,
rangeRebalanceThreshold: 0,
},
Expand Down Expand Up @@ -1588,6 +1592,24 @@ func (a *Allocator) leaseholderShouldMoveDueToPreferences(
return true
}

// storeHealthOptions returns the store health options, currently only
// considering the threshold for L0 sub-levels. This threshold is not
// considered in allocation or rebalancing decisions (excluding candidate
// stores as targets) when enforcementLevel is set to storeHealthNoAction or
// storeHealthLogOnly. By default storeHealthLogOnly is the action taken. When
// there is a mixed version cluster, storeHealthNoAction is set instead.
func (a *Allocator) storeHealthOptions(ctx context.Context) storeHealthOptions {
enforcementLevel := storeHealthNoAction
if a.storePool.st.Version.IsActive(ctx, clusterversion.AutoStatsTableSettings) {
enforcementLevel = storeHealthEnforcement(l0SublevelsThresholdEnforce.Get(&a.storePool.st.SV))
}

return storeHealthOptions{
enforcementLevel: enforcementLevel,
l0SublevelThreshold: l0SublevelsThreshold.Get(&a.storePool.st.SV),
}
}

// TransferLeaseTarget returns a suitable replica to transfer the range lease
// to from the provided list. It includes the current lease holder replica
// unless asked to do otherwise by the excludeLeaseRepl parameter.
Expand Down Expand Up @@ -1735,11 +1757,14 @@ func (a *Allocator) TransferLeaseTarget(
// https://github.com/cockroachdb/cockroach/issues/75630.
bestStore, noRebalanceReason := bestStoreToMinimizeQPSDelta(
leaseReplQPS,
qpsRebalanceThreshold.Get(&a.storePool.st.SV),
minQPSDifferenceForTransfers.Get(&a.storePool.st.SV),
leaseRepl.StoreID(),
candidates,
storeDescMap,
&qpsScorerOptions{
storeHealthOptions: a.storeHealthOptions(ctx),
qpsRebalanceThreshold: qpsRebalanceThreshold.Get(&a.storePool.st.SV),
minRequiredQPSDiff: minQPSDifferenceForTransfers.Get(&a.storePool.st.SV),
},
)

switch noRebalanceReason {
Expand Down
Loading

0 comments on commit 83deaaa

Please sign in to comment.