-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Segment Replication] Allocation changes to distribute primaries - Benchmark & document improvements using opensearch-benchmark. #6210
Comments
Looking into this. |
@shwetathareja's comment from PR #6325
Have you looked into adding an AllocationConstraint similar to isIndexShardsPerNodeBreached which essentially adds constraint to ensure each node shouldn't have more than avg. no.of shards per index even if a node has overall less shards compared to other nodes. This helps in preventing un-necessary rebalances later. More details are here - #487 I feel you can simply add a constraint for segrep indices to ensure a node shouldn't have more than avg. no. of expected primaries for an index. Lets say it is 5 node cluster and segrep index has 5 primaries + 2 replicas. So no node should have more than 1 primary. This wouldn't require coming up with magic value for weight factor. Later, we can evaluate if we want to extend this to balancing sum of primaries across all indices across all nodes as well. e.g. 1 segrep index has 3 primaries while other has 2 primaries and it is 5 node cluster, then each node shouldn't have more than 1 primary across indices as well. |
Yes, this is a great idea! I think having a constraint (here, average per index primary shard count per node let's call it ProposalApproach 1.Introduce average primary shard count constraint in existing allocation & rebalacing logic which ranks nodes based on weights and selects one with minimum weight as allocation and/or rebalancing target. The node is marked breaching contraint when it contains more than average primary shards of an index. Such nodes are penalized by adding a very high constant to it weight calculation, which results in lower chances of node being selected as allocation or rebalancing target. E.g. For 4 node 10 shard 1 replica setup, Special consideration: During rebalancing, when this constraint is breached, it is advantageous to move primary first Vs random shard to converge towards smaller delta (weight diff b/w heavist & lightest). With balancing is performed at index level, this approach ensures there is uniform primary shard distribution per index when rebalancing concludes. Tried a quick POC by updating weight function(as coded below), and it seems to work based resulting in balanced primary shard distribution
Pros.
Cons.
Approach 2.Perform balancing at global level across all indices and nodes once using algorithm below. Steps
Pros.
Cons.
Approach 3.Combine previous 2 approaches so that an overall balance is attained. First, balance the nodes for a single index (approach 1) followed by another round of balancing (approach 2). Pros.
Cons
First round results in moving Index1 shard to Node 2
Second round moves it back
Conclusion
Approch 1 seem promising which can be extended to solve primary balancing across all indices by adding a new constraint. I will add this as follow up of #6422 Improvements
Problem 1 (More relocations)
Problem 2 (No relocation possible)
@shwetathareja @nknize @Bukhtawar @vigyasharma : Request for feedback. Please feel free to add more folks who can help here. |
Thanks @dreamer-89 for exploring the AllocationConstraint approach. Couple of comments on different approaches
Approach 2 Approach 3 Overall, I am aligned to start with per index primaries to be evenly distributed and later iterate over to make it evenly distribute across indices primaries as well. |
Thank you so much @shwetathareja for the feedback! I have implemented Approach 1 which employs avg primary shard count per node constraint during allocation and rebalancing - PR #6422 (request you to review as well).
State 1
Node 3 is restarted, coordinator promotes R3 as primary resulting in below state. Node 3 joins back cluster containing R3, R2 State 2
Node 1 breaches average primary shard count per node constraint State 3
Node 3 now have more number of shards which still breaches State 4
👍
I think applying both constraints (avg primary shard count per index and avg primary shard count across all indices) together will be useful as it will result in minimal relocations.
Thank you! Yes, per index primary balancing is implemented in #6422; we can add global balancing constraint as follow up based on discussion here. |
Great discussion here, @dreamer-89 and @shwetathareja. This is a tricky, but important problem to solve. Happy to see the original Constraint Based Shard Allocation idea continue to find utility over the years! I think we are on the right track here. I like the option of adding two constraints, one for "primaries per index per node", and another one for "total primary shards per node".
I would strongly suggest thinking hard about how to keep the shard weight algorithm as simple as possible. It is tempting to add more capabilities, but it quickly becomes exponentially complex to reason and debug shard allocation in a production cluster. This was why, in the original impl., I had made each constraint add a fixed high value weight, and modeled constraints as boolean predicates. As we add multiple constraints, there are no clear answers to whether breach of a particular constraint, is worse than the breach of a different constraint. (e.g. is it worse to breach the ...
Constraint framework today is simple - it adds That is the central idea. Let a simple default weight function do its work. Then all eligible nodes with no constraints breached are better than ones with 1 constraint breached, who are better than nodes with 2 constraints breached and so on.. Finally, I think it makes sense to extend the framework to handle rebalances. I did play around with it at the time, but threw it away, since |
Re: imbalances that emerge from replica -> primary promotions.. I think you need the pri-rep swap capability to truly be able to solve it. There is no other way (I know of), to solve that problem for a 2 node cluster. |
Thank you @vigyasharma for the feedback.
Sorry, I was not clear enough in the example. I meant to have a state where N1, N2 have unequal primary shard count but still balanced on weight calculation. Updated the example and copy pasted below as well. Overall, I agree, let's keep the constraint as predicate and not change until it is really needed.
Yes, I agree. Also, defining relative priority for multiple constraints will be difficult and need more brainstorming.
Yes, I added the constraint in weight calculation during rebalancing. Sorry, I was not clear. With extra rebalances - I meant extra relocations compared to existing world where primary and replicas are equal. With proposal here, we need to move primary to attain primary balance followed by another relocation such that weight diff b/w any two nodes is within threshold (default
Yes agree. Created #6481 to track the discussion. |
#6017 Introduced a new weight factor to attempt to evenly distribute primary shards. We need to document this change including benchmarks to 1) find a reasonable default for SR enabled indices and 2) prove when using the setting is beneficial.
Related opensearch-project/documentation-website#2663
The text was updated successfully, but these errors were encountered: