Nodes drop their copy of auto-expanded data when coming up, only to sync it again #1873

avar · 2012-04-18T16:16:56Z

When you have an index with index.auto_expand_replicas=0-all running on 3 nodes and you bring down one node the number of replicas will be reduced by the master from 2 to 1. Then when the node that just went down comes up again ElasticSearch on that node will will:

Go up, notice that the number of replicas for that index is 1, and promptly drop its own data as redundant
The master will notice that it has a new node in the cluster, set the number of replicas to 2.
The node that just dropped its data will now have the data it just dropped re-synced to it.

Instead ElasticSearch should:

Go up, wait for the master to adjust the number of replicas if needed
Only after that's done drop anything, if needed.
Not re-sync any data since it didn't drop the data in the brief interim when the master was adjusting the number of replicas from 1 to 2.

This'll aid recovery time where you have a setup where a relatively small index is available on all the nodes for capacity reasons, and you bring up a new node that should serve search requests right away.

The text was updated successfully, but these errors were encountered:

clintongormley · 2012-07-02T08:23:16Z

Yeah, I was just hit by this one too. Wonder what happens if you disable reallocation before shutting down?

martijnvg · 2014-07-18T09:23:07Z

Perhaps scheduling the deletion of the physical shard files when a shard is no longer allocated on a node can help here. Then there is time window the master node can react the the node rejoining and the deletion of physical shard files can be cancelled.

portante · 2017-03-07T13:50:49Z

@ywelsch, @clintongormley, is this issue going to be addressed for 2.4.x any time soon? Is this issue a problem with 5.x?

lukas-vlcek · 2017-03-07T15:22:00Z

@portante I just tested with ES v5.2.2 and I am able to replicate this issue.

…icsearch#1873

elasticmachine · 2018-03-26T15:34:55Z

Pinging @elastic/es-distributed

Auto-expands replicas in the same cluster state update (instead of a follow-up reroute) where nodes are added or removed. Closes #1873, fixing an issue where nodes drop their copy of auto-expanded data when coming up, only to sync it again later.

clintongormley added the discuss label Jul 8, 2014

clintongormley added enhancement and removed discuss labels Jul 18, 2014

clintongormley assigned bleskes Jul 18, 2014

clintongormley unassigned bleskes Nov 9, 2014

clintongormley added help wanted adoptme v2.0.0-beta1 labels Nov 9, 2014

clintongormley added the :Allocation label Jun 7, 2015

clintongormley added v2.0.0 v2.1.0 and removed v2.0.0-beta1 v2.0.0 labels Aug 13, 2015

clintongormley added v2.2.0 and removed v2.1.0 labels Nov 20, 2015

spinscale added v2.3.0 and removed v2.2.0 labels Dec 23, 2015

clintongormley added v2.4.0 and removed v2.3.0 labels Mar 16, 2016

clintongormley added v2.4.1 and removed v2.4.0 labels Aug 24, 2016

clintongormley added v2.4.2 and removed v2.4.1 labels Sep 23, 2016

clintongormley removed the v2.4.2 label Nov 6, 2016

clintongormley mentioned this issue Dec 9, 2016

auto_expand_replicas might lead to premature shard deletion #21717

Closed

clintongormley added high hanging fruit and removed help wanted adoptme labels Dec 9, 2016

clintongormley assigned ywelsch Dec 9, 2016

lukas-vlcek mentioned this issue Mar 7, 2017

Do not use auto_expand_replicas openshift/openshift-ansible#3580

Merged

jcantrill added a commit to jcantrill/openshift-ansible that referenced this issue Mar 15, 2017

Set auto_expand_replicas to Default of 'false' to avoid elastic/elast…

d19006a

…icsearch#1873

jcantrill mentioned this issue Mar 15, 2017

Set auto_expand_replicas to Default of 'false' openshift/openshift-ansible#3672

Closed

lcawl added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Allocation labels Feb 13, 2018

ywelsch added :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. labels Mar 26, 2018

ywelsch mentioned this issue May 7, 2018

Auto-expand replicas when adding or removing nodes #30423

Merged

ywelsch closed this as completed in #30423 May 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nodes drop their copy of auto-expanded data when coming up, only to sync it again #1873

Nodes drop their copy of auto-expanded data when coming up, only to sync it again #1873

avar commented Apr 18, 2012

clintongormley commented Jul 2, 2012

martijnvg commented Jul 18, 2014

portante commented Mar 7, 2017

lukas-vlcek commented Mar 7, 2017

elasticmachine commented Mar 26, 2018

Nodes drop their copy of auto-expanded data when coming up, only to sync it again #1873

Nodes drop their copy of auto-expanded data when coming up, only to sync it again #1873

Comments

avar commented Apr 18, 2012

clintongormley commented Jul 2, 2012

martijnvg commented Jul 18, 2014

portante commented Mar 7, 2017

lukas-vlcek commented Mar 7, 2017

elasticmachine commented Mar 26, 2018