Auto-expand replicas only after failing nodes #30553

ywelsch · 2018-05-13T10:05:48Z

#30423 combined auto-expansion in the same cluster state update where nodes are removed. As the auto-expansion step would run before deassociating the dead nodes from the routing table, the auto-expansion would possibly remove replicas from live nodes instead of dead ones. This PR reverses the order to ensure that when nodes leave the cluster that the auto-expand-replica functionality only triggers after failing the shards on the removed nodes. This ensures that active shards on other live nodes are not failed if the primary resided on a now dead node.
Instead, one of the replicas on the live nodes first gets promoted to primary, and the auto-expansion (removing replicas) only triggers in a follow-up step (but still same cluster state update).

Relates to #30456 (comment)
and follow-up of #30423

elasticmachine · 2018-05-13T10:05:49Z

Pinging @elastic/es-distributed

bleskes

LGTM. Thanks for the extra iteration.

…as-on-node-removal

ywelsch · 2018-05-14T18:13:05Z

Thanks @bleskes

#30423 combined auto-expansion in the same cluster state update where nodes are removed. As the auto-expansion step would run before deassociating the dead nodes from the routing table, the auto-expansion would possibly remove replicas from live nodes instead of dead ones. This commit reverses the order to ensure that when nodes leave the cluster that the auto-expand-replica functionality only triggers after failing the shards on the removed nodes. This ensures that active shards on other live nodes are not failed if the primary resided on a now dead node. Instead, one of the replicas on the live nodes first gets promoted to primary, and the auto- expansion (removing replicas) only triggers in a follow-up step (but still same cluster state update). Relates to #30456 and follow-up of #30423

* 6.x: Revert "Silence IndexUpgradeIT test failures. (#30430)" [DOCS] Remove references to changelog and to highlights Revert "Mute ML upgrade test (#30458)" [ML] Fix BWC version for backport of #30125 [Docs] Improve section detailing translog usage (#30573) [Tests] Relax allowed delta in extended_stats aggregation (#30569) Fail if reading from closed KeyStoreWrapper (#30394) [ML] Reverse engineer Grok patterns from categorization results (#30125) Derive max composite buffers from max content len Update build file due to doc file rename SQL: Extract SQL request and response classes (#30457) Remove the changelog (#30593) Revert "Add deprecation warning for default shards (#30587)" Silence IndexUpgradeIT test failures. (#30430) Add deprecation warning for default shards (#30587) [DOCS] Adds 6.4.0 release highlight pages [DOCS] Adds release highlight pages (#30590) Docs: Document how to rebuild analyzers (#30498) [DOCS] Fixes title capitalization in security content LLRest: Add equals and hashcode tests for Request (#30584) [DOCS] Fix realm setting names (#30499) [DOCS] Fix path info for various security files (#30502) Docs: document precision limitations of geo_bounding_box (#30540) Fix non existing javadocs link in RestClientTests Auto-expand replicas only after failing nodes (#30553)

Auto-expand replicas only after failing nodes

a67c356

ywelsch added >bug v7.0.0 :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v6.4.0 labels May 13, 2018

ywelsch requested a review from bleskes May 13, 2018 10:05

ywelsch added 2 commits May 14, 2018 14:02

simplify

58d4536

move assertion

8aa820e

bleskes approved these changes May 14, 2018

View reviewed changes

Merge remote-tracking branch 'elastic/master' into auto-expand-replic…

9d080ae

…as-on-node-removal

ywelsch merged commit d5f028e into elastic:master May 14, 2018

ywelsch mentioned this pull request May 15, 2018

UpgradeClusterClientYamlTestSuiteIT fails intermittently #30456

Closed

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-expand replicas only after failing nodes #30553

Auto-expand replicas only after failing nodes #30553

ywelsch commented May 13, 2018

elasticmachine commented May 13, 2018

bleskes left a comment

ywelsch commented May 14, 2018

Auto-expand replicas only after failing nodes #30553

Auto-expand replicas only after failing nodes #30553

Conversation

ywelsch commented May 13, 2018

elasticmachine commented May 13, 2018

bleskes left a comment

Choose a reason for hiding this comment

ywelsch commented May 14, 2018