Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shard stuck Initializing #10764

Closed
msimos opened this issue Apr 24, 2015 · 3 comments
Closed

Shard stuck Initializing #10764

msimos opened this issue Apr 24, 2015 · 3 comments
Labels
>bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard.

Comments

@msimos
Copy link

msimos commented Apr 24, 2015

On Elasticsearch 1.3.5, one shard went to initializing and wasn't able to be recovered. From the logs we see that there is an Invalid Alias Name exception on a delete by query action for the shard with the problem:

[21:04:05,703][TRACE][action.deletebyquery     ] [Boobytrap] failure on replica [9424d5fc870d4909adb3b96c5fb21bdc][1]
org.elasticsearch.indices.InvalidAliasNameException: [9424d5fc870d4909adb3b96c5fb21bdc] Invalid alias name [7bde63b5d9268c12368a5bc8f52435a3df26490605d68d0ababf29eedc547262], Unknown alias name was passed to alias Filter
        at org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:93)
        at org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:452)
        at org.elasticsearch.action.deletebyquery.TransportShardDeleteByQueryAction.shardOperationOnReplica(TransportShardDeleteByQueryAction.java:143)
[21:04:05,704][WARN ][index.engine.internal    ] [Boobytrap] [9424d5fc870d4909adb3b96c5fb21bdc][1] failed engine [deleteByQuery/shard failed on replica]
[21:04:05,720][TRACE][action.index             ] [Boobytrap] failure on replica [9424d5fc870d4909adb3b96c5fb21bdc][1]
org.elasticsearch.index.IndexShardMissingException: [9424d5fc870d4909adb3b96c5fb21bdc][1] missing
        at org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:184)
        at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica(TransportIndexAction.java:230)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:250)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:229)

Later on the it seems that the Elasticsearch tries to recover the shard but this also fails:

[21:05:30,481][WARN ][indices.recovery         ] [Boobytrap] [9424d5fc870d4909adb3b96c5fb21bdc][1] recovery from [[Emil Blonsky][yAfL58lmRz-4m4yBLqPotA][polaris-prod-135-w-esnode-brck7y9opyj5l][inet[/10.0.0.97:9300]]{master=false}] failed
org.elasticsearch.transport.RemoteTransportException: [Emil Blonsky][inet[/10.0.0.97:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [9424d5fc870d4909adb3b96c5fb21bdc][1] Phase[2] Execution failed
        at org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:1109)
        at org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:637)
        at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:137)
        at org.elasticsearch.indices.recovery.RecoverySource.access$2600(RecoverySource.java:74)
        at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:465)
        at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:451)

Caused by: org.elasticsearch.transport.RemoteTransportException: [Boobytrap][inet[/10.0.0.94:9300]][index/shard/recovery/translogOps]
Caused by: org.elasticsearch.indices.InvalidAliasNameException: [9424d5fc870d4909adb3b96c5fb21bdc] Invalid alias name [7bde63b5d9268c12368a5bc8f52435a3df26490605d68d0ababf29eedc547262], Unknown alias name was passed to alias Filter
        at org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:93)
        at org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:452)
        at org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:781)
        at org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:431)
        at org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:410)

@clintongormley
Copy link
Contributor

Hi @msimos

It looks like an alias was deleted just before the delete-by-query action arrived in the translog. I think your only option now is to delete that translog, I'm afraid.

Delete-by-query has a number of issues, and we're planning on removing it in its current form in 2.0 (see #10067), hopefully to be replaced by something safer.

/cc @mikemccand

@markwalkom
Copy link
Contributor

This appears to be happening a bit on one of our customers clusters.
What can we do to pinpoint and resolve this, or is it a case of not using delete-by?

@clintongormley
Copy link
Contributor

This is a case of "don't use delete-by-query", which has been reimplemented in a safer way as a plugin in 2.0. Only solution here is to delete the transaction log.

@clintongormley clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Translog :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard.
Projects
None yet
Development

No branches or pull requests

3 participants