Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-expand replicas only after failing nodes #30553

Merged
merged 4 commits into from
May 14, 2018

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented May 13, 2018

#30423 combined auto-expansion in the same cluster state update where nodes are removed. As the auto-expansion step would run before deassociating the dead nodes from the routing table, the auto-expansion would possibly remove replicas from live nodes instead of dead ones. This PR reverses the order to ensure that when nodes leave the cluster that the auto-expand-replica functionality only triggers after failing the shards on the removed nodes. This ensures that active shards on other live nodes are not failed if the primary resided on a now dead node.
Instead, one of the replicas on the live nodes first gets promoted to primary, and the auto-expansion (removing replicas) only triggers in a follow-up step (but still same cluster state update).

Relates to #30456 (comment)
and follow-up of #30423

@ywelsch ywelsch added >bug v7.0.0 :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v6.4.0 labels May 13, 2018
@ywelsch ywelsch requested a review from bleskes May 13, 2018 10:05
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the extra iteration.

@ywelsch ywelsch merged commit d5f028e into elastic:master May 14, 2018
@ywelsch
Copy link
Contributor Author

ywelsch commented May 14, 2018

Thanks @bleskes

ywelsch added a commit that referenced this pull request May 14, 2018
#30423 combined auto-expansion in the same cluster state update where nodes are removed. As
the auto-expansion step would run before deassociating the dead nodes from the routing table, the
auto-expansion would possibly remove replicas from live nodes instead of dead ones. This commit
reverses the order to ensure that when nodes leave the cluster that the auto-expand-replica
functionality only triggers after failing the shards on the removed nodes. This ensures that active
shards on other live nodes are not failed if the primary resided on a now dead node.
Instead, one of the replicas on the live nodes first gets promoted to primary, and the auto-
expansion (removing replicas) only triggers in a follow-up step (but still same cluster state update).

Relates to #30456 and follow-up of #30423
dnhatn added a commit that referenced this pull request May 15, 2018
* 6.x:
  Revert "Silence IndexUpgradeIT test failures. (#30430)"
  [DOCS] Remove references to changelog and to highlights
  Revert "Mute ML upgrade test (#30458)"
  [ML] Fix BWC version for backport of #30125
  [Docs] Improve section detailing translog usage (#30573)
  [Tests] Relax allowed delta in extended_stats aggregation (#30569)
  Fail if reading from closed KeyStoreWrapper (#30394)
  [ML] Reverse engineer Grok patterns from categorization results (#30125)
  Derive max composite buffers from max content len
  Update build file due to doc file rename
  SQL: Extract SQL request and response classes (#30457)
  Remove the changelog (#30593)
  Revert "Add deprecation warning for default shards (#30587)"
  Silence IndexUpgradeIT test failures. (#30430)
  Add deprecation warning for default shards (#30587)
  [DOCS] Adds 6.4.0 release highlight pages
  [DOCS] Adds release highlight pages (#30590)
  Docs: Document how to rebuild analyzers (#30498)
  [DOCS] Fixes title capitalization in security content
  LLRest: Add equals and hashcode tests for Request (#30584)
  [DOCS] Fix realm setting names (#30499)
  [DOCS] Fix path info for various security files (#30502)
  Docs: document precision limitations of geo_bounding_box (#30540)
  Fix non existing javadocs link in RestClientTests
  Auto-expand replicas only after failing nodes (#30553)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants