-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Consider which replica will be removed when adding a replica to improve balance #17971
Comments
@a-robinson, here is my understanding, I don't know if it's right. |
@a6802739 it won't always be the case that the replica we add is in the same locality as the one that we'll want to remove. For example, if a cluster has localities This will need to be a little more sophisticated. Actually running through the |
@a-robinson, So what we should do is And what do you mean |
Roughly, although we do already have code for prioritizing diversity of localities. The issue here is more that when we're considering whether to do a rebalance, we'll sometimes rebalance to some store x1 on the basis of it being better than store y, even though when it comes time to remove a replica after up-replicating beyond the desired number of replicas we'll end up comparing x1 and x2 (and potentially x3, x4, etc.) to decide which replica to remove. This can mean that we'll remove x1 immediately after adding it, because it may be worse than x2 even though it's much better than y.
When deciding whether to do a rebalance, we run |
@a-robinson , Thank you very much. As you said, if we want to rebalance a replica from store y to store x1, we could just add a replica to store x1, and just remove the replica on store y. so the replicas for this range will not be up-replicated beyond the desired number of replicas, right?
Yeah, I just don't understand if it's better than y, why don't we just remove the replica from y directly?
Yeah, we could get a replica from Thank you for your kind explanation. |
That's true (although the add and remove can't be done atomically). In this hypothetical scenario, the Allocator is essentially making a bad choice (adding
We grade stores in three different ways:
In this hypothetical scenario, |
In clusters with multiple localities, we try to maintain diversity by spreading the replicas for each range as evenly across localities as possible. This means that in a cluster with 3 localities and 3 replicas per range, we'll try to keep ` replica in each locality for each range.
When in a rebalancing state where one locality has 1 replicas and the other localities each have 1, we'll always remove a replica from the locality that has 2. This is good.
What isn't as good is that we don't consider this when deciding whether to rebalance in the first place. Our rebalancing logic will kick in if a possible new destination is a better fit than any of the existing replicas, even if the new destination is in a different locality and thus won't actually be a direct replacement for the existing replica that isn't a great fit.
I haven't seen this cause massive problems by itself, but in combination with another problem (I've seen it flare up with both #17879 and #17970) it can make for rebalance thrashing, where we repeatedly add and remove a replica on the same 1 or 2 nodes.
Not all situations will be as straightforward as the 3-locality example above, so we'll have to make sure the fix for this is somewhat more general than just making sure that a potential replica to add is in the same locality as the worst existing replica.
The text was updated successfully, but these errors were encountered: