-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allocator: make disk capacity threshold a setting #97409
Conversation
b912b40
to
f442715
Compare
f442715
to
afff942
Compare
return errors.Errorf( | ||
"Cannot set kv.allocator.max_disk_utilization_threshold greater than 0.95") | ||
} | ||
if f < 0.05 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, we're allowing setting this to near zero? Almost seems as though we could abuse this to implement "gateway-only nodes" (nodes that don't hold any replicas) if we allowed setting this to 0.00001.
Just a drive-by.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to be too assuming w.r.t the threshold. The 0.95 max seems good but the 0.05 could be changed or removed.
5b37f25
to
2aea517
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 8 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli and @tbg)
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
line 1915 at r2 (raw file):
func (a *Allocator) ScorerOptions(ctx context.Context) *RangeCountScorerOptions { return &RangeCountScorerOptions{ DiskOptions: a.DiskOptions(),
nit: I don't love the name DiskOptions
as it isn't clear this is specifically about Capacity. DiskCapacityOptions
would be better, but not necessarily worth changing everything for. There are other Options
about disks that are not captured here (max throughput, number of ranges, ...).
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go
line 186 at r1 (raw file):
Previously, kvoli (Austen) wrote…
I didn't want to be too assuming w.r.t the threshold. The 0.95 max seems good but the 0.05 could be changed or removed.
This cap is too low. Setting it to 0.99 or 1.0 for safety so people don't think it's a percent and set it to something like '90' is useful, but this is too constraining. I could see a situation where the entire system is close to 95% full and the allocator is mostly disabled due to that. Normally, it doesn't make sense for the default value to be the same as the max allowed value. I also think it makes sense to add some warnings about setting this too low (but not add a check). If this is set at something like 0.5 because a user thinks they want to see their system balanced, it will not work as expected. The 0.5 will be hit, but the net effect is that other rebalancing will be broken and the system could even become unstable.
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go
line 211 at r2 (raw file):
}, )
Similar to comment above, set the max allowed to be a higher value and add a note that this should be set lower than the previous setting and also not set too aggressively low. Also do we need a check that this is always strictly less than the one above? It might be OK to have there be no buffer, but as you note in the system this isn't ideal.
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go
line 618 at r2 (raw file):
type DiskOptions struct { RebalanceToThreshold float64 MaxThreshold float64
nit: Consider renaming this ShedThreshold
to clarify what happens when it hits this threshold.
pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go
line 1235 at r2 (raw file):
Capacity: roachpb.StoreCapacity{ Capacity: 100, Available: 100 - int64(100*float64(defaultMaxDiskUtilizationThreshold)),
Consider writing this as: 100 - int64(100*float64(defaultRebalanceToMaxDiskUtilizationThreshold)) -1
as written it is "right on" the border, and could cause confusing off-by-one errors.
pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go
line 1247 at r2 (raw file):
Capacity: roachpb.StoreCapacity{ Capacity: 100, Available: (100 - int64(100*float64(defaultMaxDiskUtilizationThreshold))) / 2,
Consider writing as 100 - int64(100*float64(defaultMaxDiskUtilizationThreshold)) -1
pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go
line 1310 at r2 (raw file):
a.Metrics, ) if expResult := (i >= 3); expResult != result {
Why did the expected result change? Add a comment on why this is 3.
bec2f94
to
d6a3925
Compare
Previously, the store disk utilization was checked against a constant threshold to determine if the store was a valid allocation target. This commit adds two cluster settings to replace the constant. `kv.allocator.max_disk_utilization_threshold` Maximum disk utilization before a store will never be used as a rebalance or allocation target and will actively have replicas moved off of it. `kv.allocator.rebalance_to_max_disk_utilization_threshold` Maximum disk utilization before a store will never be used as a rebalance target. Resolves: cockroachdb#97392 Release note (ops change): Introduce two cluster settings to control disk utilization thresholds for allocation. `kv.allocator.rebalance_to_max_disk_utilization_threshold` Maximum disk utilization before a store will never be used as a rebalance target. `kv.allocator.max_disk_utilization_threshold` Maximum disk utilization before a store will never be used as a rebalance or allocation target and will actively have replicas moved off of it.
d6a3925
to
e003e83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look. I've updated the patch in response to your feedback. It should be good for another round of review @andrewbaptist.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, and @tbg)
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
line 1915 at r2 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
nit: I don't love the name
DiskOptions
as it isn't clear this is specifically about Capacity.DiskCapacityOptions
would be better, but not necessarily worth changing everything for. There are otherOptions
about disks that are not captured here (max throughput, number of ranges, ...).
Good point. Updated to DiskCapacityOptions
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go
line 186 at r1 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
This cap is too low. Setting it to 0.99 or 1.0 for safety so people don't think it's a percent and set it to something like '90' is useful, but this is too constraining. I could see a situation where the entire system is close to 95% full and the allocator is mostly disabled due to that. Normally, it doesn't make sense for the default value to be the same as the max allowed value. I also think it makes sense to add some warnings about setting this too low (but not add a check). If this is set at something like 0.5 because a user thinks they want to see their system balanced, it will not work as expected. The 0.5 will be hit, but the net effect is that other rebalancing will be broken and the system could even become unstable.
Updated to >0.99. I also added a min check at 0 to prevent negative numbers. I don't think this actually matters but for completeness sake.
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go
line 211 at r2 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
Similar to comment above, set the max allowed to be a higher value and add a note that this should be set lower than the previous setting and also not set too aggressively low. Also do we need a check that this is always strictly less than the one above? It might be OK to have there be no buffer, but as you note in the system this isn't ideal.
Updated to >0.99 check. Added note to not set higher than the other setting and vice versa on the other setting.
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go
line 618 at r2 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
nit: Consider renaming this
ShedThreshold
to clarify what happens when it hits this threshold.
It is Shed + Block Allocation + Block Rebalancing. I'll update to be ShedAndBlockAll
pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go
line 1235 at r2 (raw file):
100 - int64(100*float64(defaultMaxDiskUtilizationThreshold)) -1
I've updated this to be clearer.
pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go
line 1247 at r2 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
Consider writing as
100 - int64(100*float64(defaultMaxDiskUtilizationThreshold)) -1
Ditto.
pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go
line 1310 at r2 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
Why did the expected result change? Add a comment on why this is 3.
Previously the storepool wouldn't update the average Range Count for a StoreList if the candidate store didn't pass the maxCapacityCheck. I removed this to break the dependency on the constant.
This logic was already redundant for rebalancing as we create equivalence classes based on equally valid/diverse and non-disk fullness. So the store list would never contain a full disk store to affect range count. For allocation, I updated the logic to be similar so we check earlier and remove candidates to then create a valid store list.
This test is was previously asserting that s3 which had 5 ranges (all stores sum = 27, num stores = 5) would be rebalanced away from because s4 and s5 would not have their range count added to the store list avg range count. The avg previously for the store list would be (7/5 = 1.4) so s3 we would expect shouldRebalanceAway to be true since it is greater than the mean. That is no longer the case so I've updated the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick turnaround on this. We should absolutely merge this in because it will be helpful to have these knobs, but do you have an idea how a system would respond if it had an issue like the customer issue we had related to this? I assume it would keep the disks from getting full, but do you know if there would be other negative implications on rebalancing?
Reviewed all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @kvoli and @tbg)
There would only be negative effects when there's a combination of other issues as well as full disk such as overload on every non full disk. I think this should generally be pretty robust unless misconfigured. |
bors r=andrewbaptist |
Build failed: |
CI failure could be legit. I'll investigate tomorrow. bors r- |
bors r=andrewbaptist |
Build succeeded: |
In cockroachdb#97409 we introduced cluster settings to control the disk fullness threshold for rebalancing towards a store and shedding replicas off of the store. The `TestAllocatorFullDisks` assumes the total number of range bytes is equal or less than the rebalance threshold of the nodes, however the test was updated to use the shed threshold instead. This caused the test to flake occasionally as there was more than the expected amount of total range bytes. This patch changes the ranges per node calculation to use the rebalance threshold again, instead of the shed threshold Fixes: cockroachdb#100033 Release note: None
100189: kvcoord: Restart ranges on a dedicated goroutine. r=miretskiy a=miretskiy Restart ranges on a dedicated goroutine (if needed). Fix logic bug in stuck range handling. Increase verbosity of logging to help debug mux rangefeed issues. Informs #99560 Informs #99640 Informs #99214 Informs #98925 Informs #99092 Informs #99212 Informs #99910 Informs #99560 Release note: None 100525: rpc: Handle closed error r=erikgrinaker a=andrewbaptist We close the listener before closing the connection. This can result in a spurious failure due to the Listener also closing our connection. Epic: none Fixes: #100391 Fixes: #77754 Informs: #80034 Release note: None 100528: sql: fix flaky TestSQLStatsCompactor r=j82w a=j82w The test failure is showing more total wide scans than expected. Change the compact stats job to run once a year to avoid it running at the same time as the test. The interceptor is disabled right after delete reducing the possibility of another operation causing a conflict. Epic: none closes: #99653 Release note: none 100589: allocator: deflake full disk test r=andrewbaptist a=kvoli In #97409 we introduced cluster settings to control the disk fullness threshold for rebalancing towards a store and shedding replicas off of the store. The `TestAllocatorFullDisks` assumes the total number of range bytes is equal or less than the rebalance threshold of the nodes, however the test was updated to use the shed threshold instead. This caused the test to flake occasionally as there was more than the expected amount of total range bytes. This patch changes the ranges per node calculation to use the rebalance threshold again, instead of the shed threshold ``` dev test pkg/kv/kvserver/allocator/allocatorimpl -f TestAllocatorFullDisks -v --stress ... 15714 runs so far, 0 failures, over 39m45s ``` Fixes: #100033 Release note: None 100610: roachtest: set config.Quiet to true r=herkolategan a=srosenberg After refactoring in [1], the default of config.Quiet was set to false since the roachprod CLI option is intended to set it to true. This resulted in an unwanted side-effect, namely roachtests running with the new default. Consequently, test_runner's log ended up with a bunch of (terminal) escape codes due to (status) spinner. This change ensures roachtest explicitly sets config.Quiet to true. [1] #99133 Epic: none Release note: None Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Andrew Baptist <[email protected]> Co-authored-by: j82w <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Stan Rosenberg <[email protected]>
In #97409 we introduced cluster settings to control the disk fullness threshold for rebalancing towards a store and shedding replicas off of the store. The `TestAllocatorFullDisks` assumes the total number of range bytes is equal or less than the rebalance threshold of the nodes, however the test was updated to use the shed threshold instead. This caused the test to flake occasionally as there was more than the expected amount of total range bytes. This patch changes the ranges per node calculation to use the rebalance threshold again, instead of the shed threshold Fixes: #100033 Release note: None
Previously, the store disk utilization was checked against a constant
threshold to determine if the store was a valid allocation target.
This commit adds two cluster settings to replace the constant.
kv.allocator.max_disk_utilization_threshold
Maximum disk utilization before a store will never be used as a
rebalance or allocation target and will actively have replicas moved off
of it.
kv.allocator.rebalance_to_max_disk_utilization_threshold
Maximum disk utilization before a store will never be used as a
rebalance target.
Resolves: #97392
Release note (ops change): Introduce two cluster settings to control disk
utilization thresholds for allocation.
kv.allocator.rebalance_to_max_disk_utilization_threshold
Maximum diskutilization before a store will never be used as a rebalance target.
kv.allocator.max_disk_utilization_threshold
Maximum disk utilizationbefore a store will never be used as a rebalance or allocation target
and will actively have replicas moved off of it.