-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: add constraint rule solver for allocation #8786
Conversation
Since Peter and Ben are both out, @tamird, would you mind looking at this? |
I haven't read the RFC. @bdarnell: will you be able to give this some On Wed, Aug 24, 2016 at 2:25 PM, Tristan Rice [email protected]
|
Review status: 0 of 40 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed. storage/rule_solver.go, line 54 [r4] (raw file):
In general I prefer to see types defined before their first use. Move Document the inputs and outputs of this function where it's not trivial; in particular mention that the scores from different rules are summed. storage/rule_solver.go, line 67 [r4] (raw file):
Comment doesn't match code. storage/rule_solver.go, line 83 [r4] (raw file):
Comment doesn't match code. storage/rule_solver.go, line 164 [r4] (raw file):
I think it would be a little simpler to combine the three constraint rules into a single pass over Constraints with a switch on the Type field. storage/rule_solver.go, line 233 [r4] (raw file):
How were these weights determined? I think diversity is more important than positive constraints. It would be good for all the weights to be collected in one place since they're only meaningful relative to each other. Maybe all the rules should return scores in the range 0-1 and then storage/rule_solver.go, line 236 [r4] (raw file):
How big is sl.stores? We only need descriptors for the stores in storage/rule_solver.go, line 248 [r4] (raw file):
This shouldn't be a panic - store descriptors are gossiped and cannot be updated atomically, so we have to allow for times when the set of tiers doesn't match. We'll have to just skip tiers that are not present. Comments from Reviewable |
@BramGrunier, @cdo: I'm not going to be able to give this a thorough review due to the stability code yellow. Can you guys pick this up? In particular, it is worth thinking about the performance implications of this approach and what can be done. Something that has been percolating in my mind is whether we can precompute lists of candidate stores given a set of constraints. My guess is that the sets of required and prohibited constraints required by zone configs will translate into a small(ish) number of sets of candidate stores. So performing an allocation would be a map lookup to find the set of candidate stores and then an additional bit of filtering and processing to ensure rules such as "unique nodes", "diversity" and "capacity" that can't be determined ahead of time. Review status: 0 of 40 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed. Comments from Reviewable |
I'll give it a thorough review this morning. Comments from Reviewable |
Review status: 0 of 40 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed. storage/rule_solver.go, line 233 [r4] (raw file):
|
Reviewed 3 of 3 files at r1, 25 of 25 files at r2, 18 of 18 files at r3, 3 of 3 files at r4. cli/cliflags/flags.go, line 63 [r3] (raw file):
Please add a number of other examples! config/config.go, line 112 [r2] (raw file):
Why short? config/config.go, line 121 [r2] (raw file):
Again, why short? config/config.proto, line 17 [r2] (raw file):
While here, why not make this a proto3? It should clean up a lot of the gogoproto stuff. config/config.proto, line 78 [r2] (raw file):
Why not just make this a repeated Constraint here? Why do you need the Constraints message? config/config_test.go, line 376 [r2] (raw file):
Probably a good idea to add a bunch of different test cases here. Empty constraints, single constraints, a bunch of constraints. config/config_test.go, line 387 [r2] (raw file):
Again, this seems unnecessary. The double constraints: seems strange. config/migration_test.go, line 33 [r2] (raw file):
You could make these YAML instead and maybe save some space. Not sure if it's worth it. config/migration_test.go, line 76 [r2] (raw file):
Please add an error case in which the migration failure happens. roachpb/metadata.go, line 207 [r3] (raw file):
Do we escape equal signs? roachpb/metadata.proto, line 124 [r3] (raw file):
I think (but I'm not sure) that if this is a proto3, you can just nest the tier inside the locality. sql/config.go, line 35 [r2] (raw file):
This old comment can go. storage/allocator.go, line 72 [r2] (raw file):
You're going to need to rename/update a lot of the places where attribute is used. storage/allocator.go, line 205 [r2] (raw file):
attrs seems like the wrong name now. Also, this should relax in a predetermined order now since we have different levels of constraints. storage/allocator_test.go, line 245 [r2] (raw file):
Can you add in the equivalent of this test with the new system? storage/rule_solver.go, line 164 [r4] (raw file):
|
Thanks, @BramGruneir! Also, not sure if you noticed, but there are a stack of PRs for this to split it. ZoneConfig format stuff should go in #8627 (which was already merged), CLI stuff should go in #8676. Review status: all files reviewed at latest revision, 24 unresolved discussions, some commit checks failed. cli/cliflags/flags.go, line 63 [r3] (raw file):
|
Ah, I didn't know. I was just doing a review of this alone. I'll take a look at both of those. Review status: all files reviewed at latest revision, 17 unresolved discussions, some commit checks failed. cli/cliflags/flags.go, line 63 [r3] (raw file):
|
Review status: all files reviewed at latest revision, 17 unresolved discussions, some commit checks failed. config/config_test.go, line 387 [r2] (raw file):
|
Latest updates also have a bunch of smallish changes that I made while waiting for comments. Review status: 8 of 21 files reviewed at latest revision, 17 unresolved discussions, some commit checks pending. Comments from Reviewable |
Review status: 8 of 21 files reviewed at latest revision, 17 unresolved discussions, some commit checks failed. config/config.proto, line 17 [r2] (raw file):
|
Review status: 8 of 21 files reviewed at latest revision, 16 unresolved discussions, some commit checks failed. storage/rule_solver.go, line 233 [r4] (raw file):
|
Reviewed 20 of 20 files at r5, 3 of 3 files at r6. cli/cliflags/flags.go, line 56 [r5] (raw file):
Could you explain what is or what is not implemented? Or link to an issue number or something? config/config.proto, line 17 [r2] (raw file):
|
Review status: all files reviewed at latest revision, 18 unresolved discussions, some commit checks failed. cli/cliflags/flags.go, line 63 [r3] (raw file):
|
The rule solver looks good. So LGTM for that. Probably best to wait on Ben's approval too. Might want to rename the PR to remove the WIP as well. Reviewed 7 of 7 files at r7, 3 of 3 files at r8. Comments from Reviewable |
LGTM Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed. Comments from Reviewable |
cli/cliflags/flags.go, line 63 [r3] (raw file):
|
Review status: 1 of 3 files reviewed at latest revision, 8 unresolved discussions, some commit checks pending. config/config.proto, line 78 [r2] (raw file):
|
config/config.proto, line 78 [r2] (raw file):
|
config/config.proto, line 78 [r2] (raw file):
|
config/config.proto, line 78 [r2] (raw file):
|
I just switched the rule functions to take a @BramGruneir, if it's okay I think I'll make the constraint changes in a separate PR so it's easier to deal with. Review status: 1 of 3 files reviewed at latest revision, 9 unresolved discussions, some commit checks pending. storage/rule_solver.go, line 236 [r4] (raw file):
|
Rules are represented as a single function that returns the candidacy of the store as well as a float value representing the score. These scores are then aggregated from all rules and returns the stores sorted by them. Current rules: - ruleReplicasUniqueNodes ensures that no two replicas are put on the same node. - ruleConstraints enforces that required and prohibited constraints are followed, and that stores with more positive constraints are ranked higher. - ruleDiversity ensures that nodes that have the fewest locality tiers in common are given higher priority. - ruleCapacity prioritizes placing data on empty nodes when the choice is available and prevents data from going onto mostly full nodes.
Closed after merging superset #8959. |
This builds on #8676, #8627.
The existing allocator is rather complicated and hard to understand. This is an attempt to make it easier to understand by having composable rules.
Rules are represented as a single function that returns the candidacy of the
store as well as a float value representing the score. These scores are then
aggregated from all rules and returns the stores sorted by them.
Current rules:
I'm slightly worried that this will be too slow when there are lots of stores and constraints.
cc @bdarnell @petermattis @BramGruneir
This change is