allocator: make disk capacity threshold a setting #97409

kvoli · 2023-02-21T20:00:38Z

Previously, the store disk utilization was checked against a constant
threshold to determine if the store was a valid allocation target.

This commit adds two cluster settings to replace the constant.

kv.allocator.max_disk_utilization_threshold

Maximum disk utilization before a store will never be used as a
rebalance or allocation target and will actively have replicas moved off
of it.

kv.allocator.rebalance_to_max_disk_utilization_threshold

Maximum disk utilization before a store will never be used as a
rebalance target.

Resolves: #97392

Release note (ops change): Introduce two cluster settings to control disk
utilization thresholds for allocation.
kv.allocator.rebalance_to_max_disk_utilization_threshold Maximum disk
utilization before a store will never be used as a rebalance target.
kv.allocator.max_disk_utilization_threshold Maximum disk utilization
before a store will never be used as a rebalance or allocation target
and will actively have replicas moved off of it.

cockroach-teamcity · 2023-02-21T20:00:46Z

This change is

tbg · 2023-02-27T08:44:49Z

pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go

+			return errors.Errorf(
+				"Cannot set kv.allocator.max_disk_utilization_threshold greater than 0.95")
+		}
+		if f < 0.05 {


Interesting, we're allowing setting this to near zero? Almost seems as though we could abuse this to implement "gateway-only nodes" (nodes that don't hold any replicas) if we allowed setting this to 0.00001.

Just a drive-by.

I didn't want to be too assuming w.r.t the threshold. The 0.95 max seems good but the 0.05 could be changed or removed.

andrewbaptist

Reviewed 3 of 8 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli and @tbg)

pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1915 at r2 (raw file):

func (a *Allocator) ScorerOptions(ctx context.Context) *RangeCountScorerOptions {
	return &RangeCountScorerOptions{
		DiskOptions:             a.DiskOptions(),

nit: I don't love the name DiskOptions as it isn't clear this is specifically about Capacity. DiskCapacityOptions would be better, but not necessarily worth changing everything for. There are other Options about disks that are not captured here (max throughput, number of ranges, ...).

pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 186 at r1 (raw file):

Previously, kvoli (Austen) wrote…

I didn't want to be too assuming w.r.t the threshold. The 0.95 max seems good but the 0.05 could be changed or removed.

This cap is too low. Setting it to 0.99 or 1.0 for safety so people don't think it's a percent and set it to something like '90' is useful, but this is too constraining. I could see a situation where the entire system is close to 95% full and the allocator is mostly disabled due to that. Normally, it doesn't make sense for the default value to be the same as the max allowed value. I also think it makes sense to add some warnings about setting this too low (but not add a check). If this is set at something like 0.5 because a user thinks they want to see their system balanced, it will not work as expected. The 0.5 will be hit, but the net effect is that other rebalancing will be broken and the system could even become unstable.

pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 211 at r2 (raw file):

	},
)

Similar to comment above, set the max allowed to be a higher value and add a note that this should be set lower than the previous setting and also not set too aggressively low. Also do we need a check that this is always strictly less than the one above? It might be OK to have there be no buffer, but as you note in the system this isn't ideal.

pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 618 at r2 (raw file):

type DiskOptions struct {
	RebalanceToThreshold float64
	MaxThreshold         float64

nit: Consider renaming this ShedThreshold to clarify what happens when it hits this threshold.

pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 1235 at r2 (raw file):

			Capacity: roachpb.StoreCapacity{
				Capacity:   100,
				Available:  100 - int64(100*float64(defaultMaxDiskUtilizationThreshold)),

Consider writing this as: 100 - int64(100*float64(defaultRebalanceToMaxDiskUtilizationThreshold)) -1 as written it is "right on" the border, and could cause confusing off-by-one errors.

pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 1247 at r2 (raw file):

			Capacity: roachpb.StoreCapacity{
				Capacity:   100,
				Available:  (100 - int64(100*float64(defaultMaxDiskUtilizationThreshold))) / 2,

Consider writing as 100 - int64(100*float64(defaultMaxDiskUtilizationThreshold)) -1

pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 1310 at r2 (raw file):

			a.Metrics,
		)
		if expResult := (i >= 3); expResult != result {

Why did the expected result change? Add a comment on why this is 3.

Previously, the store disk utilization was checked against a constant threshold to determine if the store was a valid allocation target. This commit adds two cluster settings to replace the constant. `kv.allocator.max_disk_utilization_threshold` Maximum disk utilization before a store will never be used as a rebalance or allocation target and will actively have replicas moved off of it. `kv.allocator.rebalance_to_max_disk_utilization_threshold` Maximum disk utilization before a store will never be used as a rebalance target. Resolves: cockroachdb#97392 Release note (ops change): Introduce two cluster settings to control disk utilization thresholds for allocation. `kv.allocator.rebalance_to_max_disk_utilization_threshold` Maximum disk utilization before a store will never be used as a rebalance target. `kv.allocator.max_disk_utilization_threshold` Maximum disk utilization before a store will never be used as a rebalance or allocation target and will actively have replicas moved off of it.

kvoli

Thanks for taking a look. I've updated the patch in response to your feedback. It should be good for another round of review @andrewbaptist.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, and @tbg)

pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1915 at r2 (raw file):