Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allocator: make disk capacity threshold a setting #97409

Merged
merged 1 commit into from
Mar 8, 2023

Conversation

kvoli
Copy link
Collaborator

@kvoli kvoli commented Feb 21, 2023

Previously, the store disk utilization was checked against a constant
threshold to determine if the store was a valid allocation target.

This commit adds two cluster settings to replace the constant.

kv.allocator.max_disk_utilization_threshold

Maximum disk utilization before a store will never be used as a
rebalance or allocation target and will actively have replicas moved off
of it.

kv.allocator.rebalance_to_max_disk_utilization_threshold

Maximum disk utilization before a store will never be used as a
rebalance target.

Resolves: #97392

Release note (ops change): Introduce two cluster settings to control disk
utilization thresholds for allocation.
kv.allocator.rebalance_to_max_disk_utilization_threshold Maximum disk
utilization before a store will never be used as a rebalance target.
kv.allocator.max_disk_utilization_threshold Maximum disk utilization
before a store will never be used as a rebalance or allocation target
and will actively have replicas moved off of it.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kvoli kvoli force-pushed the 230221.disk-thresholds branch 3 times, most recently from b912b40 to f442715 Compare February 24, 2023 01:31
@kvoli kvoli self-assigned this Feb 24, 2023
@kvoli kvoli marked this pull request as ready for review February 24, 2023 01:36
@kvoli kvoli requested a review from a team as a code owner February 24, 2023 01:36
@kvoli kvoli changed the title allocator: [wip] make disk capacity threshold a setting allocator: make disk capacity threshold a setting Feb 24, 2023
@kvoli kvoli force-pushed the 230221.disk-thresholds branch from f442715 to afff942 Compare February 24, 2023 15:00
return errors.Errorf(
"Cannot set kv.allocator.max_disk_utilization_threshold greater than 0.95")
}
if f < 0.05 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, we're allowing setting this to near zero? Almost seems as though we could abuse this to implement "gateway-only nodes" (nodes that don't hold any replicas) if we allowed setting this to 0.00001.

Just a drive-by.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to be too assuming w.r.t the threshold. The 0.95 max seems good but the 0.05 could be changed or removed.

@kvoli kvoli force-pushed the 230221.disk-thresholds branch 2 times, most recently from 5b37f25 to 2aea517 Compare February 27, 2023 22:46
@kvoli kvoli requested a review from andrewbaptist February 27, 2023 22:46
Copy link
Collaborator

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 8 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli and @tbg)


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1915 at r2 (raw file):

func (a *Allocator) ScorerOptions(ctx context.Context) *RangeCountScorerOptions {
	return &RangeCountScorerOptions{
		DiskOptions:             a.DiskOptions(),

nit: I don't love the name DiskOptions as it isn't clear this is specifically about Capacity. DiskCapacityOptions would be better, but not necessarily worth changing everything for. There are other Options about disks that are not captured here (max throughput, number of ranges, ...).


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 186 at r1 (raw file):

Previously, kvoli (Austen) wrote…

I didn't want to be too assuming w.r.t the threshold. The 0.95 max seems good but the 0.05 could be changed or removed.

This cap is too low. Setting it to 0.99 or 1.0 for safety so people don't think it's a percent and set it to something like '90' is useful, but this is too constraining. I could see a situation where the entire system is close to 95% full and the allocator is mostly disabled due to that. Normally, it doesn't make sense for the default value to be the same as the max allowed value. I also think it makes sense to add some warnings about setting this too low (but not add a check). If this is set at something like 0.5 because a user thinks they want to see their system balanced, it will not work as expected. The 0.5 will be hit, but the net effect is that other rebalancing will be broken and the system could even become unstable.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 211 at r2 (raw file):

	},
)

Similar to comment above, set the max allowed to be a higher value and add a note that this should be set lower than the previous setting and also not set too aggressively low. Also do we need a check that this is always strictly less than the one above? It might be OK to have there be no buffer, but as you note in the system this isn't ideal.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 618 at r2 (raw file):

type DiskOptions struct {
	RebalanceToThreshold float64
	MaxThreshold         float64

nit: Consider renaming this ShedThreshold to clarify what happens when it hits this threshold.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 1235 at r2 (raw file):

			Capacity: roachpb.StoreCapacity{
				Capacity:   100,
				Available:  100 - int64(100*float64(defaultMaxDiskUtilizationThreshold)),

Consider writing this as: 100 - int64(100*float64(defaultRebalanceToMaxDiskUtilizationThreshold)) -1 as written it is "right on" the border, and could cause confusing off-by-one errors.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 1247 at r2 (raw file):

			Capacity: roachpb.StoreCapacity{
				Capacity:   100,
				Available:  (100 - int64(100*float64(defaultMaxDiskUtilizationThreshold))) / 2,

Consider writing as 100 - int64(100*float64(defaultMaxDiskUtilizationThreshold)) -1


pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 1310 at r2 (raw file):

			a.Metrics,
		)
		if expResult := (i >= 3); expResult != result {

Why did the expected result change? Add a comment on why this is 3.

@kvoli kvoli force-pushed the 230221.disk-thresholds branch 2 times, most recently from bec2f94 to d6a3925 Compare March 7, 2023 19:47
Previously, the store disk utilization was checked against a constant
threshold to determine if the store was a valid allocation target.

This commit adds two cluster settings to replace the constant.

`kv.allocator.max_disk_utilization_threshold`

Maximum disk utilization before a store will never be used as a
rebalance or allocation target and will actively have replicas moved off
of it.

`kv.allocator.rebalance_to_max_disk_utilization_threshold`

Maximum disk utilization before a store will never be used as a
rebalance target.

Resolves: cockroachdb#97392

Release note (ops change): Introduce two cluster settings to control disk
utilization thresholds for allocation.
`kv.allocator.rebalance_to_max_disk_utilization_threshold` Maximum disk
utilization before a store will never be used as a rebalance target.
`kv.allocator.max_disk_utilization_threshold` Maximum disk utilization
before a store will never be used as a rebalance or allocation target
and will actively have replicas moved off of it.
@kvoli kvoli force-pushed the 230221.disk-thresholds branch from d6a3925 to e003e83 Compare March 7, 2023 19:56
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look. I've updated the patch in response to your feedback. It should be good for another round of review @andrewbaptist.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, and @tbg)


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1915 at r2 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

nit: I don't love the name DiskOptions as it isn't clear this is specifically about Capacity. DiskCapacityOptions would be better, but not necessarily worth changing everything for. There are other Options about disks that are not captured here (max throughput, number of ranges, ...).

Good point. Updated to DiskCapacityOptions


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 186 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

This cap is too low. Setting it to 0.99 or 1.0 for safety so people don't think it's a percent and set it to something like '90' is useful, but this is too constraining. I could see a situation where the entire system is close to 95% full and the allocator is mostly disabled due to that. Normally, it doesn't make sense for the default value to be the same as the max allowed value. I also think it makes sense to add some warnings about setting this too low (but not add a check). If this is set at something like 0.5 because a user thinks they want to see their system balanced, it will not work as expected. The 0.5 will be hit, but the net effect is that other rebalancing will be broken and the system could even become unstable.

Updated to >0.99. I also added a min check at 0 to prevent negative numbers. I don't think this actually matters but for completeness sake.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 211 at r2 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

Similar to comment above, set the max allowed to be a higher value and add a note that this should be set lower than the previous setting and also not set too aggressively low. Also do we need a check that this is always strictly less than the one above? It might be OK to have there be no buffer, but as you note in the system this isn't ideal.

Updated to >0.99 check. Added note to not set higher than the other setting and vice versa on the other setting.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 618 at r2 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

nit: Consider renaming this ShedThreshold to clarify what happens when it hits this threshold.

It is Shed + Block Allocation + Block Rebalancing. I'll update to be ShedAndBlockAll


pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 1235 at r2 (raw file):

100 - int64(100*float64(defaultMaxDiskUtilizationThreshold)) -1
I've updated this to be clearer.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 1247 at r2 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

Consider writing as 100 - int64(100*float64(defaultMaxDiskUtilizationThreshold)) -1

Ditto.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 1310 at r2 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

Why did the expected result change? Add a comment on why this is 3.

Previously the storepool wouldn't update the average Range Count for a StoreList if the candidate store didn't pass the maxCapacityCheck. I removed this to break the dependency on the constant.

This logic was already redundant for rebalancing as we create equivalence classes based on equally valid/diverse and non-disk fullness. So the store list would never contain a full disk store to affect range count. For allocation, I updated the logic to be similar so we check earlier and remove candidates to then create a valid store list.

This test is was previously asserting that s3 which had 5 ranges (all stores sum = 27, num stores = 5) would be rebalanced away from because s4 and s5 would not have their range count added to the store list avg range count. The avg previously for the store list would be (7/5 = 1.4) so s3 we would expect shouldRebalanceAway to be true since it is greater than the mean. That is no longer the case so I've updated the test.

@kvoli kvoli requested a review from andrewbaptist March 7, 2023 20:30
Copy link
Collaborator

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thanks for the quick turnaround on this. We should absolutely merge this in because it will be helpful to have these knobs, but do you have an idea how a system would respond if it had an issue like the customer issue we had related to this? I assume it would keep the disks from getting full, but do you know if there would be other negative implications on rebalancing?

Reviewed all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kvoli and @tbg)

@kvoli
Copy link
Collaborator Author

kvoli commented Mar 8, 2023

:lgtm: Thanks for the quick turnaround on this. We should absolutely merge this in because it will be helpful to have these knobs, but do you have an idea how a system would respond if it had an issue like the customer issue we had related to this? I assume it would keep the disks from getting full, but do you know if there would be other negative implications on rebalancing?

Reviewed all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kvoli and @tbg)

There would only be negative effects when there's a combination of other issues as well as full disk such as overload on every non full disk.

I think this should generally be pretty robust unless misconfigured.

@kvoli
Copy link
Collaborator Author

kvoli commented Mar 8, 2023

bors r=andrewbaptist

@craig
Copy link
Contributor

craig bot commented Mar 8, 2023

Build failed:

@kvoli
Copy link
Collaborator Author

kvoli commented Mar 8, 2023

CI failure could be legit. I'll investigate tomorrow.

bors r-

@kvoli
Copy link
Collaborator Author

kvoli commented Mar 8, 2023

bors r=andrewbaptist

@craig
Copy link
Contributor

craig bot commented Mar 8, 2023

Build succeeded:

kvoli added a commit to kvoli/cockroach that referenced this pull request Apr 4, 2023
In cockroachdb#97409 we introduced cluster settings to control the disk fullness
threshold for rebalancing towards a store and shedding replicas off of
the store. The `TestAllocatorFullDisks` assumes the total number of
range bytes is equal or less than the rebalance threshold of the nodes,
however the test was updated to use the shed threshold instead. This
caused the test to flake occasionally as there was more than the
expected amount of total range bytes.

This patch changes the ranges per node calculation to use the rebalance
threshold again, instead of the shed threshold

Fixes: cockroachdb#100033

Release note: None
craig bot pushed a commit that referenced this pull request Apr 4, 2023
100189:  kvcoord: Restart ranges on a dedicated goroutine. r=miretskiy a=miretskiy

Restart ranges on a dedicated goroutine (if needed).
Fix logic bug in stuck range handling.
Increase verbosity of logging to help debug mux rangefeed issues.

Informs #99560
Informs #99640
Informs #99214
Informs #98925
Informs #99092
Informs #99212
Informs #99910
Informs #99560

Release note: None

100525: rpc: Handle closed error r=erikgrinaker a=andrewbaptist

We close the listener before closing the connection. This can result in a spurious failure due to the Listener also closing our connection.

Epic: none
Fixes: #100391
Fixes: #77754
Informs: #80034

Release note: None

100528: sql: fix flaky TestSQLStatsCompactor r=j82w a=j82w

The test failure is showing more total wide scans
than expected. Change the compact stats job to run
once a year to avoid it running at the same time
as the test.

The interceptor is disabled right after delete
reducing the possibility of another operation
causing a conflict.

Epic: none
closes: #99653

Release note: none

100589: allocator: deflake full disk test r=andrewbaptist a=kvoli

In #97409 we introduced cluster settings to control the disk fullness threshold for rebalancing towards a store and shedding replicas off of the store. The `TestAllocatorFullDisks` assumes the total number of range bytes is equal or less than the rebalance threshold of the nodes, however the test was updated to use the shed threshold instead. This caused the test to flake occasionally as there was more than the expected amount of total range bytes.

This patch changes the ranges per node calculation to use the rebalance threshold again, instead of the shed threshold

```
dev test pkg/kv/kvserver/allocator/allocatorimpl -f TestAllocatorFullDisks -v --stress
...
15714 runs so far, 0 failures, over 39m45s
```

Fixes: #100033

Release note: None

100610: roachtest: set config.Quiet to true r=herkolategan a=srosenberg

After refactoring in [1], the default of config.Quiet was set to false since the roachprod CLI option is intended to set it to true. This resulted in an unwanted side-effect, namely roachtests running with the new default. Consequently, test_runner's log ended up with a bunch of (terminal) escape codes due to (status) spinner.

This change ensures roachtest explicitly sets config.Quiet to true.

[1] #99133

Epic: none

Release note: None

Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Co-authored-by: Andrew Baptist <[email protected]>
Co-authored-by: j82w <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Stan Rosenberg <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Apr 4, 2023
In #97409 we introduced cluster settings to control the disk fullness
threshold for rebalancing towards a store and shedding replicas off of
the store. The `TestAllocatorFullDisks` assumes the total number of
range bytes is equal or less than the rebalance threshold of the nodes,
however the test was updated to use the shed threshold instead. This
caused the test to flake occasionally as there was more than the
expected amount of total range bytes.

This patch changes the ranges per node calculation to use the rebalance
threshold again, instead of the shed threshold

Fixes: #100033

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allocator: make disk capacity thresholds configurable
4 participants