Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Consider which replica will be removed when adding a replica to improve balance #17971

Closed
a-robinson opened this issue Aug 28, 2017 · 6 comments
Assignees
Milestone

Comments

@a-robinson
Copy link
Contributor

In clusters with multiple localities, we try to maintain diversity by spreading the replicas for each range as evenly across localities as possible. This means that in a cluster with 3 localities and 3 replicas per range, we'll try to keep ` replica in each locality for each range.

When in a rebalancing state where one locality has 1 replicas and the other localities each have 1, we'll always remove a replica from the locality that has 2. This is good.

What isn't as good is that we don't consider this when deciding whether to rebalance in the first place. Our rebalancing logic will kick in if a possible new destination is a better fit than any of the existing replicas, even if the new destination is in a different locality and thus won't actually be a direct replacement for the existing replica that isn't a great fit.

I haven't seen this cause massive problems by itself, but in combination with another problem (I've seen it flare up with both #17879 and #17970) it can make for rebalance thrashing, where we repeatedly add and remove a replica on the same 1 or 2 nodes.

Not all situations will be as straightforward as the 3-locality example above, so we'll have to make sure the fix for this is somewhat more general than just making sure that a potential replica to add is in the same locality as the worst existing replica.

@a6802739
Copy link
Contributor

a6802739 commented Sep 6, 2017

@a-robinson, here is my understanding, I don't know if it's right.
when we need to make a rebalance, we should first add a replica of the range and then remove a replica of a range.
So for the situation you described above, we need to make sure the replica we add is in the same locality with the replica we want to remove?

@a-robinson
Copy link
Contributor Author

@a6802739 it won't always be the case that the replica we add is in the same locality as the one that we'll want to remove. For example, if a cluster has localities a, b, c, and d and a range is in localities a, b, and c, then adding a replica to locality d would be fine.

This will need to be a little more sophisticated. Actually running through the RemoveTarget logic before settling on a RebalanceTarget would be one way you could imagine doing this.

@a6802739
Copy link
Contributor

a6802739 commented Sep 7, 2017

@a-robinson, So what we should do is make sure each replica in different localities

And what do you mean Actually running through the RemoveTarget logic before settling on a RebalanceTarget? Sorry, I quite understand this.

@a-robinson
Copy link
Contributor Author

So what we should do is make sure each replica in different localities?

Roughly, although we do already have code for prioritizing diversity of localities. The issue here is more that when we're considering whether to do a rebalance, we'll sometimes rebalance to some store x1 on the basis of it being better than store y, even though when it comes time to remove a replica after up-replicating beyond the desired number of replicas we'll end up comparing x1 and x2 (and potentially x3, x4, etc.) to decide which replica to remove. This can mean that we'll remove x1 immediately after adding it, because it may be worse than x2 even though it's much better than y.

And what do you mean Actually running through the RemoveTarget logic before settling on a RebalanceTarget? Sorry, I quite understand this.

When deciding whether to do a rebalance, we run (*Allocator).RebalanceTarget to try to find a store worth rebalancing to. If it finds one, we'll add a replica to that store. After adding a replica, the range will have more replicas than desired, so the replicate queue will then have to pick a replica to remove from the range. That logic is contained in (*Allocator).RemoveTarget. I was just saying that one way to make sure we won't repeatedly add and remove replicas from the same store would be to simulate running (*Allocator).RemoveTarget before we actually add the replica. If the decision it makes is to remove the replica that we're thinking of adding, then we could avoid adding it.

@a6802739
Copy link
Contributor

a6802739 commented Sep 7, 2017

@a-robinson , Thank you very much.

As you said, we'll sometimes rebalance to some store x1 on the basis of it being better than store y

if we want to rebalance a replica from store y to store x1, we could just add a replica to store x1, and just remove the replica on store y. so the replicas for this range will not be up-replicated beyond the desired number of replicas, right?

This can mean that we'll remove x1 immediately after adding it, because it may be worse than x2 even though it's much better than y

Yeah, I just don't understand if it's better than y, why don't we just remove the replica from y directly?

I was just saying that one way to make sure we won't repeatedly add and remove replicas from the same store would be to simulate running (*Allocator).RemoveTarget before we actually add the replica. If the decision it makes is to remove the replica that we're thinking of adding, then we could avoid adding it.

Yeah, we could get a replica from (*Allocator).RemoveTarget. If we don't add the replica beforehand, that means we will never remove this replica, because this replica doesn't exists when we call (*Allocator).RemoveTarget.

Thank you for your kind explanation.

@a6802739 a6802739 self-assigned this Sep 7, 2017
@a-robinson
Copy link
Contributor Author

if we want to rebalance a replica from store y to store x1, we could just add a replica to store x1, and just remove the replica on store y. so the replicas for this range will not be up-replicated beyond the desired number of replicas, right?

That's true (although the add and remove can't be done atomically). In this hypothetical scenario, the Allocator is essentially making a bad choice (adding x1 because it's better for balance than y) that it doesn't realize is bad (due to diversity reasons) until it comes time to remove one of the replicas.

Yeah, I just don't understand if it's better than y, why don't we just remove the replica from y directly?

We grade stores in three different ways:

  1. Whether they meet all ZoneConfig constraints
  2. How good they are for diversity of Locality
  3. How overfull or underfull they are relative to other nodes in the cluster

In this hypothetical scenario, x1 is better than y for (3), but worse for (2) due to the existence of x2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants