Shard cannot be relocated after setting node exclusion. #57708

howardhuanghua · 2020-06-05T04:04:31Z

We have met a shard relocation issue after setting node exclusion. In our case, original cluster is 6.8.2, we try to add the same amount of new 7.5.1 nodes and exclude the 6.8.2 nodes to upgrade cluster.

However, after adding 7.5.1 nodes, and set exclude 6.8.2 nodes in cluster setting, one of the single empty .kibana index shard cannot be relocated success, we have met this issue in several times.

Here is the node list after adding new nodes, we could see 4 6.8.2 nodes and 4 7.5.1 nodes:

[c_log@VM_1_14_centos ~/repository]$ curl "localhost:9200/_cat/nodes?h=version,name,node.role&s=version"
6.8.2 1590650188002472432 dmi
6.8.2 1590650188002472632 dmi
6.8.2 1590650188002472732 dmi
6.8.2 1590650188002472532 dmi
7.5.1 1590650759002483032 dmi
7.5.1 1590650759002483132 dmi
7.5.1 1590650759002482832 dmi
7.5.1 1590650759002482932 dmi

And we set this cluster setting to exclude data from 6.8.2:

"transient" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "node_concurrent_recoveries" : "10",
          "exclude" : {
            "_name" : "1590650188002472632,1590650188002472732,1590650188002472432,1590650188002472532"
          }
        }
      }
    }

The cluster is empty and only contains kibana index. We could see the single internal .kibana_1 system index and it contains nothing docs:

[c_log@VM_1_14_centos ~]$ curl localhost:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana_1 5nRyca57QeaIN4O_SerQ7g 1 1 0 0 522b 261b

Finally, the shard 0 replica cannot be relocated to the new node:

[c_log@VM_1_14_centos ~]$ curl localhost:9200/_cat/shards?v
index shard prirep state docs store ip node
.kibana_1 0 p STARTED 0 261b 10.0.0.82 1590650759002483132 (relocated success)
.kibana_1 0 r STARTED 0 261b 10.0.0.148 1590650188002472732 (fail shard, it should be relocated)

On the master and target node, we could see this exception, no exception on source node:

 [2020-05-28T15:26:59,295][WARN ][o.e.i.c.IndicesClusterStateService] [1590650759002483032] [.kibana_1][0] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [.kibana_1][0]: Recovery failed from {1590650759002483132}{o5bJB_gPT6WiEDdt0l-v0Q}{AaaWa5nTQOuOwacnaE5xpA}{10.0.0.82}{10.0.0.82:20839}{di}{temperature=hot, rack=cvm_1_100003, set=100003, region=1, ip=9.10.49.143} into {1590650759002483032}{l42RGM6tSz-3-Dquma5OzQ}{ZZSxRSWXQjOFlETue5UHxQ}{10.0.0.205}{10.0.0.205:29559}{di}{rack=cvm_1_100003, set=100003, ip=9.10.48.33, temperature=hot, region=1}
        at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$2(PeerRecoveryTargetService.java:247) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$1.handleException(PeerRecoveryTargetService.java:292) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.transport.PlainTransportFuture.handleException(PlainTransportFuture.java:97) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1120) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.transport.InboundHandler.lambda$handleException$2(InboundHandler.java:259) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [elasticsearch-7.5.1.jar:7.5.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_181]
        at java.lang.Thread.run(Unknown Source) [?:1.8.0_181]
Caused by: org.elasticsearch.transport.RemoteTransportException: [1590650759002483132][10.0.0.82:20839][internal:index/shard/recovery/start_recovery]
Caused by: java.lang.IllegalStateException: can't move recovery to stage [FINALIZE]. current stage: [INDEX] (expected [TRANSLOG])
        at org.elasticsearch.indices.recovery.RecoveryState.validateAndSetStage(RecoveryState.java:175) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.indices.recovery.RecoveryState.setStage(RecoveryState.java:206) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.index.shard.IndexShard.finalizeRecovery(IndexShard.java:1718) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.indices.recovery.RecoveryTarget.lambda$finalizeRecovery$1(RecoveryTarget.java:313) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:285) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.indices.recovery.RecoveryTarget.finalizeRecovery(RecoveryTarget.java:294) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$FinalizeRecoveryRequestHandler.messageReceived(PeerRecoveryTargetService.java:395) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$FinalizeRecoveryRequestHandler.messageReceived(PeerRecoveryTargetService.java:389) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:280) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.1.jar:7.5.1]
        ... 3 more

The cluster is in green status after relocating failed, just the shard cannot be relocated and remain on the excluding node. This issue could not be easily re-produced.

The key log message is can't move recovery to stage [FINALIZE]. current stage: [INDEX] (expected [TRANSLOG]), it seems has any recovering process gap between 6.8 and 7.5.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-06-05T10:09:12Z

Pinging @elastic/es-distributed (:Distributed/Recovery)

DaveCTurner · 2020-06-05T10:17:47Z

Strange indeed @howardhuanghua. Can you share the output of GET _settings and GET _stats?level=shards please?

howardhuanghua · 2020-06-05T10:25:22Z

@DaveCTurner Thanks for checking this issue. Since it's customer's production env, we have triggered retry failed and the shard relocated success. It's a little bit hard to re-produce this issue. We have tried the same process in our test env for several times and cann't re-produce so far. But we do meet this issue several times in upgrading from 6.8 to 7.5.

DaveCTurner · 2020-06-05T10:36:37Z

Noted. Would still be useful to see those outputs if the customer is ok with that, especially GET _settings.

howardhuanghua · 2020-06-05T11:34:54Z

The original cluster doesn't exist. I have re-created the same version/configuration cluster, and got the _settings output here FYI, and every time we meet this issue is .kibana_1 replica cannot be relocated.

{
  ".monitoring-es-6-2020.06.05" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "30s",
        "translog" : {
          "sync_interval" : "5s",
          "durability" : "async"
        },
        "auto_expand_replicas" : "0-1",
        "provided_name" : ".monitoring-es-6-2020.06.05",
        "format" : "6",
        "max_result_window" : "65536",
        "creation_date" : "1591356090612",
        "unassigned" : {
          "node_left" : {
            "delayed_timeout" : "5m"
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "M08QyORxQe2yqzZLUQGP8Q",
        "version" : {
          "created" : "6080299"
        },
        "codec" : "best_compression",
        "number_of_shards" : "1"
      }
    }
  },
  ".kibana_1" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "30s",
        "number_of_shards" : "1",
        "translog" : {
          "sync_interval" : "5s",
          "durability" : "async"
        },
        "auto_expand_replicas" : "0-1",
        "provided_name" : ".kibana_1",
        "max_result_window" : "65536",
        "creation_date" : "1591356150338",
        "unassigned" : {
          "node_left" : {
            "delayed_timeout" : "5m"
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "hSv5R1ihTlqdgqgDGj02TQ",
        "version" : {
          "created" : "6080299"
        }
      }
    }
  },
  ".monitoring-kibana-6-2020.06.05" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "30s",
        "translog" : {
          "sync_interval" : "5s",
          "durability" : "async"
        },
        "auto_expand_replicas" : "0-1",
        "provided_name" : ".monitoring-kibana-6-2020.06.05",
        "format" : "6",
        "max_result_window" : "65536",
        "creation_date" : "1591356157143",
        "unassigned" : {
          "node_left" : {
            "delayed_timeout" : "5m"
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "tGCWJhCQSNi8PGkc2MqRyw",
        "version" : {
          "created" : "6080299"
        },
        "codec" : "best_compression",
        "number_of_shards" : "1"
      }
    }
  },
  ".kibana_task_manager" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "30s",
        "number_of_shards" : "1",
        "translog" : {
          "sync_interval" : "5s",
          "durability" : "async"
        },
        "auto_expand_replicas" : "0-1",
        "provided_name" : ".kibana_task_manager",
        "max_result_window" : "65536",
        "creation_date" : "1591356148888",
        "unassigned" : {
          "node_left" : {
            "delayed_timeout" : "5m"
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "XORvOdBIS52hCekFVIogoQ",
        "version" : {
          "created" : "6080299"
        }
      }
    }
  }
}

ywelsch · 2020-06-05T11:42:17Z

The error message here makes it suspiciously sound like a bug we just fixed a few days ago: https://github.com/elastic/elasticsearch/pull/57187/files#r431071766 (The linked PR is fixing another issue, but while @dnhatn added more tests, he uncovered that under certain edge conditions we were not properly setting the recovery stage from index to translog, which is what you look to have hit here).

I think we can close this issue, and reopen if this still occurs on newer versions that have the above bug fix.

If the recovery source is on an old node (before 7.2), then the recovery target won't have the safe commit after phase1 because the recovery source does not send the global checkpoint in the clean_files step. And if the recovery fails and retries, then the recovery stage won't transition properly. If a sync_id is used in peer recovery, then the clean_files step won't be executed to move the stage to TRANSLOG. This issue was addressed in #57187, but not forward-ported to 8.0. Closes #57708

If the recovery source is on an old node (before 7.2), then the recovery target won't have the safe commit after phase1 because the recovery source does not send the global checkpoint in the clean_files step. And if the recovery fails and retries, then the recovery stage won't transition properly. If a sync_id is used in peer recovery, then the clean_files step won't be executed to move the stage to TRANSLOG. Relates ##7187 Closes #57708

howardhuanghua added >bug needs:triage Requires assignment of a team area label labels Jun 5, 2020

DaveCTurner added :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. and removed needs:triage Requires assignment of a team area label labels Jun 5, 2020

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jun 5, 2020

ywelsch closed this as completed Jun 5, 2020

dnhatn mentioned this issue Jun 5, 2020

Fix recovery stage transition with sync_id #57754

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shard cannot be relocated after setting node exclusion. #57708

Shard cannot be relocated after setting node exclusion. #57708

howardhuanghua commented Jun 5, 2020 •

edited by DaveCTurner

Loading

elasticmachine commented Jun 5, 2020

DaveCTurner commented Jun 5, 2020

howardhuanghua commented Jun 5, 2020 •

edited

Loading

DaveCTurner commented Jun 5, 2020

howardhuanghua commented Jun 5, 2020 •

edited

Loading

ywelsch commented Jun 5, 2020

Shard cannot be relocated after setting node exclusion. #57708

Shard cannot be relocated after setting node exclusion. #57708

Comments

howardhuanghua commented Jun 5, 2020 • edited by DaveCTurner Loading

elasticmachine commented Jun 5, 2020

DaveCTurner commented Jun 5, 2020

howardhuanghua commented Jun 5, 2020 • edited Loading

DaveCTurner commented Jun 5, 2020

howardhuanghua commented Jun 5, 2020 • edited Loading

ywelsch commented Jun 5, 2020

howardhuanghua commented Jun 5, 2020 •

edited by DaveCTurner

Loading

howardhuanghua commented Jun 5, 2020 •

edited

Loading

howardhuanghua commented Jun 5, 2020 •

edited

Loading