-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: allocator will actively move replicas due to zone configs #14106
Conversation
When you tested this on indigo, was it with or without load-based leaseholder rebalancing? Reviewed 5 of 5 files at r1. pkg/storage/allocator_test.go, line 2025 at r1 (raw file):
This is what we were discussing in the meeting, right? Whether a positive constraint should move all replicas there or just ensure there's at least one there? pkg/storage/allocator_test.go, line 2029 at r1 (raw file):
What would happen to this case if we made the rabalance to 4 and then called the function again (i.e. if two replicas were in EUR, one in US, but there wasn't another EUR node to rebalance to)? pkg/storage/client_raft_test.go, line 1134 at r1 (raw file):
For what it's worth, the changes in this file probably belonged in a separate commit and/or PR, but thanks for cleaning it up :) pkg/storage/rule_solver.go, line 322 at r1 (raw file):
Is there a unit test that covers this and/or shouldRebalance? pkg/storage/rule_solver.go, line 341 at r1 (raw file):
What's meant by pkg/storage/rule_solver.go, line 354 at r1 (raw file):
Are pkg/storage/rule_solver.go, line 366 at r1 (raw file):
Does the existence of a more preferred store really qualify as a constraints check issue? If nothing else, the variable names here would lead me to believe that it's just a preference, not a constraints requirement pkg/storage/rule_solver.go, line 368 at r1 (raw file):
nit, but should this be s/try/trying? Commands like "try to ..." read funny in the logs pkg/storage/rule_solver.go, line 377 at r1 (raw file):
It's not an immediate concern, but how do you expect this method to scale with the number of stores in the cluster? We're iterating over all of them and doing some computations up to three different times, for all leaseholder replicas on each node. pkg/storage/store_pool.go, line 456 at r1 (raw file):
The Comments from Reviewable |
Without load-based leaseholder rebalancing. I can start some runs with it enabled as well, but I don't think it's going to show a major shift. Consider that the loads run on it are photos and blockwriter. Review status: all files reviewed at latest revision, 10 unresolved discussions. pkg/storage/allocator_test.go, line 2025 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
Yes. Right now, it will try to move as many as it can. And this competes with diversity. pkg/storage/allocator_test.go, line 2029 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
If there is no suitable rebalance candidate, there will be no action. We don't remove a rebalance a replica if there is no suitable place for it to go. See the final 3 tests, trying to move thing to pkg/storage/client_raft_test.go, line 1134 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
The reason for the changes here were due to this test failing and debugging it. I figured the actual change was more or less harmless. pkg/storage/rule_solver.go, line 322 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
A number already do actually. pkg/storage/rule_solver.go, line 341 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
Renamed to just pkg/storage/rule_solver.go, line 354 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
yes they are, but that's a good idea regardless. Done. pkg/storage/rule_solver.go, line 366 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
Yeah, having a preferred store is not a requirement but should more or less guarantee that at least one replica is on a preferred store. Without this, when you change the zone config to include a preference, nothing will happen. But there might be the need for a bit more logic here. I'm going to consider adding a bit more logic to ensure that if a preferred constraint is already filled, we can ignore it. That should remove all the thrashing. pkg/storage/rule_solver.go, line 368 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
updated pkg/storage/rule_solver.go, line 377 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
If in the 100 to 1000 range, I think we're ok. When we start pushing 10000, we can revisit it. But the bigger issue would be having a cache in the store pool instead of locking, which will slow down everything that uses the store pool. pkg/storage/store_pool.go, line 456 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
yeah, I added the comment on it in Comments from Reviewable |
I need to take another look at this when my eyes are fresher. Review status: 4 of 5 files reviewed at latest revision, 11 unresolved discussions, all commit checks successful. pkg/storage/allocator_test.go, line 2025 at r1 (raw file): Previously, BramGruneir (Bram Gruneir) wrote…
Why does this compete with diversity? Shouldn't one win (either the positive constraint or diversity)? Competing implies they have the same priority, but I don't think we want a level playing field here. Btw, before fixing this I think we should figure out what the desired semantics of a positive constraint are. pkg/storage/rule_solver.go, line 341 at r1 (raw file): Previously, BramGruneir (Bram Gruneir) wrote…
I think this would be clearer as a single map from storeID to constraint info:
FYI, you can define a type within a method (you just can't define any methods on that type). pkg/storage/rule_solver.go, line 377 at r1 (raw file): Previously, BramGruneir (Bram Gruneir) wrote…
I bet this per-store computation will show up in profiles before we hit 1000 stores. pkg/storage/rule_solver.go, line 353 at r2 (raw file):
I've always found code like this more readable if the assignment and the comparison have the variables on the same side. Above, you're comparing
Ditto for the comparison below. Feel free to ignore. Comments from Reviewable |
Review status: 4 of 5 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful. pkg/storage/allocator_test.go, line 2025 at r1 (raw file): Previously, BramGruneir (Bram Gruneir) wrote…
And is that the behavior that's agreed upon as what we want? pkg/storage/allocator_test.go, line 2029 at r1 (raw file): Previously, BramGruneir (Bram Gruneir) wrote…
Yeah, that seems wise. Mind opening an issue or adding it to an existing one? Comments from Reviewable |
Review status: 4 of 5 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful. pkg/storage/allocator_test.go, line 2025 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
After further reflection, I think we should redefine a positive constraint as a preference for at least one replica following that constraint. This will require more work, since we won't be able to blindly pick a store with more positive constraint matches, but instead we will have to find the candidates that increase the number of matches (and when removing, the opposite, which one takes the least away). Right now, a single positive constraint is equal to a full diversity score (i.e. top level locality difference). But if you had two, say So for the interim, we if penalize the convergence score by 0.01, than we should be able to emulate at least making sure that the preference is respected over diversity. pkg/storage/allocator_test.go, line 2029 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
I'll add it. #14113 pkg/storage/rule_solver.go, line 341 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
Done. pkg/storage/rule_solver.go, line 377 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
When we are dealing with 1000 store issues, I'll be very happy to refactor this. pkg/storage/rule_solver.go, line 353 at r2 (raw file): Previously, petermattis (Peter Mattis) wrote…
Done. Comments from Reviewable |
but wait for @petermattis's full review. Reviewed 1 of 1 files at r3. Comments from Reviewable |
This change allows the allocator to determine when there are replicas on nodes or stores that violate a constraint. This can happen when a zone config is changed after the replicas are already present.
I've pushed a new version in which I've removed all logic around positive constraints. Once #14163 is closed, I'll follow up with actively rebalancing based on a change in positive constraints. |
Based on all my testing, this looks like it's ready to go, and if would could get it in this week's beta would be great. |
Review status: 2 of 5 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful. Comments from Reviewable |
This change allows the allocator to determine when there are replicas on nodes or stores that violate a constraint. This can happen when a zone config is changed after the replicas are already present.
Replaces #13288
I've done a large number of tests on indigo and it performed admirably. I'll add some of those results in a bit.
This change is