-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New master repeatedly reroute and fetch shard store of recovering replica #40107
Labels
:Distributed Coordination/Allocation
All issues relating to the decision making around placing a shard (both master logic & on the nodes)
Comments
dnhatn
added
the
:Distributed Indexing/Recovery
Anything around constructing a new shard, either from a local or a remote source.
label
Mar 15, 2019
Pinging @elastic/es-distributed |
dnhatn
added
:Distributed Coordination/Allocation
All issues relating to the decision making around placing a shard (both master logic & on the nodes)
and removed
:Distributed Indexing/Recovery
Anything around constructing a new shard, either from a local or a remote source.
labels
Mar 16, 2019
Relates #29140 (comment) and issues linked from there. |
ywelsch
added a commit
that referenced
this issue
May 22, 2019
A shard that is undergoing peer recovery is subject to logging warnings of the form org.elasticsearch.action.FailedNodeException: Failed node [XYZ] ... Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in ... These failures are actually harmless, and expected to happen while a peer recovery is ongoing (i.e. there is an IndexShard instance, but no proper IndexCommit just yet). As these failures are currently bubbled up to the master, they cause unnecessary reroutes and confusion amongst users due to being logged as warnings. Closes #40107
ywelsch
added a commit
that referenced
this issue
May 22, 2019
A shard that is undergoing peer recovery is subject to logging warnings of the form org.elasticsearch.action.FailedNodeException: Failed node [XYZ] ... Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in ... These failures are actually harmless, and expected to happen while a peer recovery is ongoing (i.e. there is an IndexShard instance, but no proper IndexCommit just yet). As these failures are currently bubbled up to the master, they cause unnecessary reroutes and confusion amongst users due to being logged as warnings. Closes #40107
ywelsch
added a commit
that referenced
this issue
May 22, 2019
A shard that is undergoing peer recovery is subject to logging warnings of the form org.elasticsearch.action.FailedNodeException: Failed node [XYZ] ... Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in ... These failures are actually harmless, and expected to happen while a peer recovery is ongoing (i.e. there is an IndexShard instance, but no proper IndexCommit just yet). As these failures are currently bubbled up to the master, they cause unnecessary reroutes and confusion amongst users due to being logged as warnings. Closes #40107
gurkankaymak
pushed a commit
to gurkankaymak/elasticsearch
that referenced
this issue
May 27, 2019
…2287) A shard that is undergoing peer recovery is subject to logging warnings of the form org.elasticsearch.action.FailedNodeException: Failed node [XYZ] ... Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in ... These failures are actually harmless, and expected to happen while a peer recovery is ongoing (i.e. there is an IndexShard instance, but no proper IndexCommit just yet). As these failures are currently bubbled up to the master, they cause unnecessary reroutes and confusion amongst users due to being logged as warnings. Closes elastic#40107
Symptomatic error messages:
Related conditions:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Distributed Coordination/Allocation
All issues relating to the decision making around placing a shard (both master logic & on the nodes)
If a master fails over and there's a replica performing phase1 of recovery, then the new master will repeatedly but fail to fetch shard store of the recovering replica until that replica completes phase 1. Some consequences of this:
/cc @ywelsch
The text was updated successfully, but these errors were encountered: