-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: reapply the rule solver #10252
Conversation
Reviewed 12 of 14 files at r1. pkg/storage/allocator.go, line 115 at r1 (raw file):
this can be a value pkg/storage/allocator.go, line 174 at r1 (raw file):
hm, seems like this wants to return a value now instead of a pointer pkg/storage/allocator.go, line 185 at r1 (raw file):
how come the randomness that was previously here is no longer needed? pkg/storage/allocator.go, line 201 at r1 (raw file):
expand this rationale; why is it currently required, why might it not be required? why is it OK to use a bogus range ID? pkg/storage/allocator.go, line 207 at r1 (raw file):
seems like you can initialize this to pkg/storage/allocator.go, line 217 at r1 (raw file):
why is it OK to skip checking the error? pkg/storage/allocator.go, line 269 at r1 (raw file):
Seems easy enough to fix in this PR by passing the store list as an argument to Solve; WDYT? pkg/storage/allocator.go, line 299 at r1 (raw file):
I had to squint to figure out what this counter was doing. How about this instead:
pkg/storage/allocator_test.go, line 674 at r1 (raw file):
doesn't this want to be inside pkg/storage/allocator_test.go, line 1435 at r1 (raw file):
please revert this, or put it in a separate commit. This test is important for validating that the allocator's behaviour didn't inadvertently change, and including this graphical change in the rest of the change makes it difficult to spot interesting changes. pkg/storage/client_raft_test.go, line 1917 at r1 (raw file):
why'd this change? the diff alone isn't enough? pkg/storage/store_pool.go, line 216 at r1 (raw file):
this doesn't need to be under lock, does it? pkg/storage/store_pool_test.go, line 42 at r1 (raw file):
this has one caller. Comments from Reviewable |
Review status: 12 of 14 files reviewed at latest revision, 15 unresolved discussions, all commit checks successful. pkg/storage/allocator.go, line 174 at r1 (raw file):
|
This change is significant enough that I think we want to enable the new rule-based allocator via an env var and have it disabled by default. Review status: 12 of 14 files reviewed at latest revision, 18 unresolved discussions, all commit checks successful. pkg/storage/rule_solver.go, line 81 at r1 (raw file):
The generality of a configurable list of rules feels like overkill to me. pkg/storage/rule_solver.go, line 122 at r1 (raw file):
Does any caller every use the list? Might be better to return a single candidate that is randomly selected here. One thought on reintroducing randomness: shuffle pkg/storage/rule_solver.go, line 127 at r1 (raw file):
It is super confusing that Comments from Reviewable |
Oh, I like that idea a lot. Going to take a bit of plumbing since this touches a lot. But I think it's doable. And a note on this PR. My goal here is to bring back the code from the original commits more than entirely fixing them. So by adding the env flag, that should allow some of the details (like how it's not correctly random) to be address in follow up PRs. Otherwise, this PR would just get out of hand. So easy fixes are going in, but substantial changes I'm going to reserve for follow-ups. Comments from Reviewable |
I'd be fine with seeing this go in without any fixes as long as it is disabled by default. Review status: 12 of 14 files reviewed at latest revision, 18 unresolved discussions, all commit checks successful. Comments from Reviewable |
pkg/storage/allocator.go, line 185 at r1 (raw file):
|
This adds back in 3 commits that were removed to facilitate the merge of develop back to master. One other commit, is no longer required. Follow up fixes are tracked in cockroachdb#10275. Closes cockroachdb#9336 1) 4446345 storage: add constraint rule solver for allocation Rules are represented as a single function that returns the candidacy of the store as well as a float value representing the score. These scores are then aggregated from all rules and returns the stores sorted by them. Current rules: - ruleReplicasUniqueNodes ensures that no two replicas are put on the same node. - ruleConstraints enforces that required and prohibited constraints are followed, and that stores with more positive constraints are ranked higher. - ruleDiversity ensures that nodes that have the fewest locality tiers in common are given higher priority. - ruleCapacity prioritizes placing data on empty nodes when the choice is available and prevents data from going onto mostly full nodes. 2) dd3229a storage: implemented RuleSolver into allocator 3) 27353a8 storage: removed unused rangeCountBalancer There was a 4th commit that is no longer required. The simulation was already converging since adding a rebalance threshold. 4e29a36 storage/simulation: only rebalance 50% of ranges on each iteration so it will converge
Review status: 12 of 14 files reviewed at latest revision, 18 unresolved discussions, all commit checks successful. pkg/storage/allocator.go, line 217 at r1 (raw file):
|
Fixed a number of the issues. Haven't added the env variable yet. That will be next. Review status: 9 of 14 files reviewed at latest revision, 18 unresolved discussions. pkg/storage/allocator.go, line 115 at r1 (raw file):
|
Review status: 9 of 14 files reviewed at latest revision, 15 unresolved discussions, some commit checks pending. pkg/storage/allocator.go, line 115 at r1 (raw file):
|
Please move the second commit into a separate PR altogether. I would like to see this PR go in with no change in the output of Reviewed 3 of 5 files at r2. pkg/storage/allocator.go, line 115 at r1 (raw file):
|
Closing this PR and I'll be reapplying the rule solver is a collection of smaller PRs. |
Instead of applying cockroachdb@1ef40f3 or cockroachdb#10252, this finishes the reapplication of the rule solver. However, this also puts the rule solver under the environment flag COCKROACH_ENABLE_RULE_SOLVER for ease of testing and defaults to not enabled. The follow up to this commit is cockroachdb#10275 and a lot of testing to ensure that the rule solver does indeed perform as expected. Closes cockroachdb#9336
Instead of applying 1ef40f3 or cockroachdb#10252, this finishes the reapplication of the rule solver. However, this also puts the rule solver under the environment flag COCKROACH_ENABLE_RULE_SOLVER for ease of testing and defaults to not enabled. This commit re-applies the rule solver, specifically the following commits: 1) 4446345 storage: add constraint rule solver for allocation Rules are represented as a single function that returns the candidacy of the store as well as a float value representing the score. These scores are then aggregated from all rules and returns the stores sorted by them. Current rules: - ruleReplicasUniqueNodes ensures that no two replicas are put on the same node. - ruleConstraints enforces that required and prohibited constraints are followed, and that stores with more positive constraints are ranked higher. - ruleDiversity ensures that nodes that have the fewest locality tiers in common are given higher priority. - ruleCapacity prioritizes placing data on empty nodes when the choice is available and prevents data from going onto mostly full nodes. 2) dd3229a storage: implemented RuleSolver into allocator The follow up to this commit is cockroachdb#10275 and a lot of testing to ensure that the rule solver does indeed perform as expected. Closes cockroachdb#9336
Instead of applying 1ef40f3 or cockroachdb#10252, this finishes the reapplication of the rule solver. However, this also puts the rule solver under the environment flag COCKROACH_ENABLE_RULE_SOLVER for ease of testing and defaults to not enabled. This commit re-applies the rule solver, specifically the following commits: 1) 4446345 storage: add constraint rule solver for allocation Rules are represented as a single function that returns the candidacy of the store as well as a float value representing the score. These scores are then aggregated from all rules and returns the stores sorted by them. Current rules: - ruleReplicasUniqueNodes ensures that no two replicas are put on the same node. - ruleConstraints enforces that required and prohibited constraints are followed, and that stores with more positive constraints are ranked higher. - ruleDiversity ensures that nodes that have the fewest locality tiers in common are given higher priority. - ruleCapacity prioritizes placing data on empty nodes when the choice is available and prevents data from going onto mostly full nodes. 2) dd3229a storage: implemented RuleSolver into allocator The follow up to this commit is cockroachdb#10275 and a lot of testing to ensure that the rule solver does indeed perform as expected. Closes cockroachdb#9336
Instead of applying 1ef40f3 or cockroachdb#10252, this finishes the reapplication of the rule solver. However, this also puts the rule solver under the environment flag COCKROACH_ENABLE_RULE_SOLVER for ease of testing and defaults to not enabled. This commit re-applies the rule solver, specifically the following commits: 1) 4446345 storage: add constraint rule solver for allocation Rules are represented as a single function that returns the candidacy of the store as well as a float value representing the score. These scores are then aggregated from all rules and returns the stores sorted by them. Current rules: - ruleReplicasUniqueNodes ensures that no two replicas are put on the same node. - ruleConstraints enforces that required and prohibited constraints are followed, and that stores with more positive constraints are ranked higher. - ruleDiversity ensures that nodes that have the fewest locality tiers in common are given higher priority. - ruleCapacity prioritizes placing data on empty nodes when the choice is available and prevents data from going onto mostly full nodes. 2) dd3229a storage: implemented RuleSolver into allocator The follow up to this commit is cockroachdb#10275 and a lot of testing to ensure that the rule solver does indeed perform as expected. Closes cockroachdb#9336
This adds back in 3 commits that were removed to facilitate the merge of develop
back to master. One other commit, is no longer required.
Closes #9336
storage: add constraint rule solver for allocation
Rules are represented as a single function that returns the candidacy of the
store as well as a float value representing the score. These scores are then
aggregated from all rules and returns the stores sorted by them.
Current rules:
followed, and that stores with more positive constraints are ranked higher.
are given higher priority.
available and prevents data from going onto mostly full nodes.
dd3229a
storage: implemented RuleSolver into allocator
27353a8
storage: removed unused rangeCountBalancer
There was a 4th commit that is no longer required. The simulation was already
converging since adding a rebalance threshold.
4e29a36
storage/simulation: only rebalance 50% of ranges on each iteration so it will
converge
This change is