Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shard cannot be relocated after setting node exclusion. #57708

Closed
howardhuanghua opened this issue Jun 5, 2020 · 6 comments · Fixed by #57754
Closed

Shard cannot be relocated after setting node exclusion. #57708

howardhuanghua opened this issue Jun 5, 2020 · 6 comments · Fixed by #57754
Labels
>bug :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@howardhuanghua
Copy link
Contributor

howardhuanghua commented Jun 5, 2020

We have met a shard relocation issue after setting node exclusion. In our case, original cluster is 6.8.2, we try to add the same amount of new 7.5.1 nodes and exclude the 6.8.2 nodes to upgrade cluster.

However, after adding 7.5.1 nodes, and set exclude 6.8.2 nodes in cluster setting, one of the single empty .kibana index shard cannot be relocated success, we have met this issue in several times.

Here is the node list after adding new nodes, we could see 4 6.8.2 nodes and 4 7.5.1 nodes:

[c_log@VM_1_14_centos ~/repository]$ curl "localhost:9200/_cat/nodes?h=version,name,node.role&s=version"
6.8.2 1590650188002472432 dmi
6.8.2 1590650188002472632 dmi
6.8.2 1590650188002472732 dmi
6.8.2 1590650188002472532 dmi
7.5.1 1590650759002483032 dmi
7.5.1 1590650759002483132 dmi
7.5.1 1590650759002482832 dmi
7.5.1 1590650759002482932 dmi

And we set this cluster setting to exclude data from 6.8.2:

"transient" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "node_concurrent_recoveries" : "10",
          "exclude" : {
            "_name" : "1590650188002472632,1590650188002472732,1590650188002472432,1590650188002472532"
          }
        }
      }
    }

The cluster is empty and only contains kibana index. We could see the single internal .kibana_1 system index and it contains nothing docs:

[c_log@VM_1_14_centos ~]$ curl localhost:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana_1 5nRyca57QeaIN4O_SerQ7g 1 1 0 0 522b 261b

Finally, the shard 0 replica cannot be relocated to the new node:

[c_log@VM_1_14_centos ~]$ curl localhost:9200/_cat/shards?v
index shard prirep state docs store ip node
.kibana_1 0 p STARTED 0 261b 10.0.0.82 1590650759002483132 (relocated success)
.kibana_1 0 r STARTED 0 261b 10.0.0.148 1590650188002472732 (fail shard, it should be relocated)

On the master and target node, we could see this exception, no exception on source node:

 [2020-05-28T15:26:59,295][WARN ][o.e.i.c.IndicesClusterStateService] [1590650759002483032] [.kibana_1][0] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [.kibana_1][0]: Recovery failed from {1590650759002483132}{o5bJB_gPT6WiEDdt0l-v0Q}{AaaWa5nTQOuOwacnaE5xpA}{10.0.0.82}{10.0.0.82:20839}{di}{temperature=hot, rack=cvm_1_100003, set=100003, region=1, ip=9.10.49.143} into {1590650759002483032}{l42RGM6tSz-3-Dquma5OzQ}{ZZSxRSWXQjOFlETue5UHxQ}{10.0.0.205}{10.0.0.205:29559}{di}{rack=cvm_1_100003, set=100003, ip=9.10.48.33, temperature=hot, region=1}
        at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$2(PeerRecoveryTargetService.java:247) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$1.handleException(PeerRecoveryTargetService.java:292) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.transport.PlainTransportFuture.handleException(PlainTransportFuture.java:97) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1120) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.transport.InboundHandler.lambda$handleException$2(InboundHandler.java:259) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [elasticsearch-7.5.1.jar:7.5.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_181]
        at java.lang.Thread.run(Unknown Source) [?:1.8.0_181]
Caused by: org.elasticsearch.transport.RemoteTransportException: [1590650759002483132][10.0.0.82:20839][internal:index/shard/recovery/start_recovery]
Caused by: java.lang.IllegalStateException: can't move recovery to stage [FINALIZE]. current stage: [INDEX] (expected [TRANSLOG])
        at org.elasticsearch.indices.recovery.RecoveryState.validateAndSetStage(RecoveryState.java:175) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.indices.recovery.RecoveryState.setStage(RecoveryState.java:206) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.index.shard.IndexShard.finalizeRecovery(IndexShard.java:1718) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.indices.recovery.RecoveryTarget.lambda$finalizeRecovery$1(RecoveryTarget.java:313) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:285) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.indices.recovery.RecoveryTarget.finalizeRecovery(RecoveryTarget.java:294) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$FinalizeRecoveryRequestHandler.messageReceived(PeerRecoveryTargetService.java:395) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$FinalizeRecoveryRequestHandler.messageReceived(PeerRecoveryTargetService.java:389) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:280) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.1.jar:7.5.1]
        ... 3 more

The cluster is in green status after relocating failed, just the shard cannot be relocated and remain on the excluding node. This issue could not be easily re-produced.

The key log message is can't move recovery to stage [FINALIZE]. current stage: [INDEX] (expected [TRANSLOG]), it seems has any recovering process gap between 6.8 and 7.5.

@howardhuanghua howardhuanghua added >bug needs:triage Requires assignment of a team area label labels Jun 5, 2020
@DaveCTurner DaveCTurner added :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. and removed needs:triage Requires assignment of a team area label labels Jun 5, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Recovery)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jun 5, 2020
@DaveCTurner
Copy link
Contributor

Strange indeed @howardhuanghua. Can you share the output of GET _settings and GET _stats?level=shards please?

@howardhuanghua
Copy link
Contributor Author

howardhuanghua commented Jun 5, 2020

@DaveCTurner Thanks for checking this issue. Since it's customer's production env, we have triggered retry failed and the shard relocated success. It's a little bit hard to re-produce this issue. We have tried the same process in our test env for several times and cann't re-produce so far. But we do meet this issue several times in upgrading from 6.8 to 7.5.

@DaveCTurner
Copy link
Contributor

Noted. Would still be useful to see those outputs if the customer is ok with that, especially GET _settings.

@howardhuanghua
Copy link
Contributor Author

howardhuanghua commented Jun 5, 2020

The original cluster doesn't exist. I have re-created the same version/configuration cluster, and got the _settings output here FYI, and every time we meet this issue is .kibana_1 replica cannot be relocated.

{
  ".monitoring-es-6-2020.06.05" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "30s",
        "translog" : {
          "sync_interval" : "5s",
          "durability" : "async"
        },
        "auto_expand_replicas" : "0-1",
        "provided_name" : ".monitoring-es-6-2020.06.05",
        "format" : "6",
        "max_result_window" : "65536",
        "creation_date" : "1591356090612",
        "unassigned" : {
          "node_left" : {
            "delayed_timeout" : "5m"
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "M08QyORxQe2yqzZLUQGP8Q",
        "version" : {
          "created" : "6080299"
        },
        "codec" : "best_compression",
        "number_of_shards" : "1"
      }
    }
  },
  ".kibana_1" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "30s",
        "number_of_shards" : "1",
        "translog" : {
          "sync_interval" : "5s",
          "durability" : "async"
        },
        "auto_expand_replicas" : "0-1",
        "provided_name" : ".kibana_1",
        "max_result_window" : "65536",
        "creation_date" : "1591356150338",
        "unassigned" : {
          "node_left" : {
            "delayed_timeout" : "5m"
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "hSv5R1ihTlqdgqgDGj02TQ",
        "version" : {
          "created" : "6080299"
        }
      }
    }
  },
  ".monitoring-kibana-6-2020.06.05" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "30s",
        "translog" : {
          "sync_interval" : "5s",
          "durability" : "async"
        },
        "auto_expand_replicas" : "0-1",
        "provided_name" : ".monitoring-kibana-6-2020.06.05",
        "format" : "6",
        "max_result_window" : "65536",
        "creation_date" : "1591356157143",
        "unassigned" : {
          "node_left" : {
            "delayed_timeout" : "5m"
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "tGCWJhCQSNi8PGkc2MqRyw",
        "version" : {
          "created" : "6080299"
        },
        "codec" : "best_compression",
        "number_of_shards" : "1"
      }
    }
  },
  ".kibana_task_manager" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "30s",
        "number_of_shards" : "1",
        "translog" : {
          "sync_interval" : "5s",
          "durability" : "async"
        },
        "auto_expand_replicas" : "0-1",
        "provided_name" : ".kibana_task_manager",
        "max_result_window" : "65536",
        "creation_date" : "1591356148888",
        "unassigned" : {
          "node_left" : {
            "delayed_timeout" : "5m"
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "XORvOdBIS52hCekFVIogoQ",
        "version" : {
          "created" : "6080299"
        }
      }
    }
  }
}

@ywelsch
Copy link
Contributor

ywelsch commented Jun 5, 2020

The error message here makes it suspiciously sound like a bug we just fixed a few days ago: https://github.com/elastic/elasticsearch/pull/57187/files#r431071766 (The linked PR is fixing another issue, but while @dnhatn added more tests, he uncovered that under certain edge conditions we were not properly setting the recovery stage from index to translog, which is what you look to have hit here).

I think we can close this issue, and reopen if this still occurs on newer versions that have the above bug fix.

@ywelsch ywelsch closed this as completed Jun 5, 2020
dnhatn added a commit that referenced this issue Jun 15, 2020
If the recovery source is on an old node (before 7.2), then the recovery
target won't have the safe commit after phase1 because the recovery
source does not send the global checkpoint in the clean_files step. And
if the recovery fails and retries, then the recovery stage won't
transition properly. If a sync_id is used in peer recovery, then the
clean_files step won't be executed to move the stage to TRANSLOG.

This issue was addressed in #57187, but not forward-ported to 8.0. 
 
Closes #57708
dnhatn added a commit that referenced this issue Jul 7, 2020
If the recovery source is on an old node (before 7.2), then the recovery
target won't have the safe commit after phase1 because the recovery
source does not send the global checkpoint in the clean_files step. And
if the recovery fails and retries, then the recovery stage won't
transition properly. If a sync_id is used in peer recovery, then the
clean_files step won't be executed to move the stage to TRANSLOG.

Relates ##7187
Closes #57708
dnhatn added a commit that referenced this issue Jul 7, 2020
If the recovery source is on an old node (before 7.2), then the recovery
target won't have the safe commit after phase1 because the recovery
source does not send the global checkpoint in the clean_files step. And
if the recovery fails and retries, then the recovery stage won't
transition properly. If a sync_id is used in peer recovery, then the
clean_files step won't be executed to move the stage to TRANSLOG.

Relates ##7187
Closes #57708
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants