Revisit ILM retry strategy for additional conditions #42824

ppf2 · 2019-06-03T23:29:57Z

Currently, ILM does not retry on most step errors other than SnapshotInProgressException.

The following are a few scenarios users have run into in the field where having a retry strategy for other errors or conditions will be helpful:

Incomplete force merge due to the underlying shard being relocated:

[2019-06-02T06:58:24,171][TRACE][o.e.a.a.i.f.TransportForceMergeAction] [node1] [indices:admin/forcemerge]  executing operation for shard [[shrink-logstash-app1-2019.06.02-000058][0], node[n8R9j8EfRD-C1Y1ipWafcA], relocating [pANpiuX9RiyfemObVUVYNA], [P], s[RELOCATING], a[id=C6yoSsM2T4CyIn0HljR67g, rId=eWz6qi4-QzSRStF3YCMn0w], expected_shard_size[53668519555]]
[2019-06-02T06:58:24,195][TRACE][o.e.a.a.i.f.TransportForceMergeAction] [node1] [indices:admin/forcemerge] failed to execute operation for shard [[shrink-logstash-app1-2019.06.02-000058][0], node[n8R9j8EfRD-C1Y1ipWafcA], relocating [pANpiuX9RiyfemObVUVYNA], [P], s[RELOCATING], a[id=C6yoSsM2T4CyIn0HljR67g, rId=eWz6qi4-QzSRStF3YCMn0w], expected_shard_size[53668519555]]
org.elasticsearch.index.shard.ShardNotFoundException: no such shard
	at org.elasticsearch.index.IndexService.getShard(IndexService.java:236) ~[elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.admin.indices.forcemerge.TransportForceMergeAction.shardOperation(TransportForceMergeAction.java:81) ~[elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.admin.indices.forcemerge.TransportForceMergeAction.shardOperation(TransportForceMergeAction.java:46) ~[elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:436) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:414) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:401) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:250) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:192) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.lambda$messageReceived$0(SecurityServerTransportInterceptor.java:299) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$runRequestInterceptors$15(AuthorizationService.java:344) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$notifyListener$1(ListenableFuture.java:97) [elasticsearch-6.7.2.jar:6.7.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:192) [elasticsearch-6.7.2.jar:6.7.2]
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118) [?:?]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:92) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:84) [elasticsearch-6.7.2.jar:6.7.2]
	at java.util.ArrayList.forEach(ArrayList.java:1540) [?:?]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:84) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:143) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.onResponse(ListenableFuture.java:109) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.StepListener.onResponse(StepListener.java:62) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.interceptor.ResizeRequestInterceptor.intercept(ResizeRequestInterceptor.java:82) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$runRequestInterceptors$14(AuthorizationService.java:339) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$notifyListener$1(ListenableFuture.java:97) [elasticsearch-6.7.2.jar:6.7.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:192) [elasticsearch-6.7.2.jar:6.7.2]
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118) [?:?]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:92) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:84) [elasticsearch-6.7.2.jar:6.7.2]
	at java.util.ArrayList.forEach(ArrayList.java:1540) [?:?]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:84) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:143) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.onResponse(ListenableFuture.java:109) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.StepListener.onResponse(StepListener.java:62) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.interceptor.IndicesAliasesRequestInterceptor.intercept(IndicesAliasesRequestInterceptor.java:102) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.runRequestInterceptors(AuthorizationService.java:345) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.handleIndexActionAuthorizationResult(AuthorizationService.java:322) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$9(AuthorizationService.java:263) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService$AuthorizationResultListener.onResponse(AuthorizationService.java:604) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService$AuthorizationResultListener.onResponse(AuthorizationService.java:579) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.buildIndicesAccessControl(RBACEngine.java:488) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.lambda$authorizeIndexAction$3(RBACEngine.java:281) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.lambda$getAsync$0(AuthorizationService.java:641) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.resolveIndexNames(AuthorizationService.java:550) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$6(AuthorizationService.java:251) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.lambda$getAsync$0(AuthorizationService.java:641) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.loadAuthorizedIndices(RBACEngine.java:312) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$5(AuthorizationService.java:247) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.getAsync(AuthorizationService.java:639) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$8(AuthorizationService.java:250) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.getAsync(AuthorizationService.java:639) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.lambda$authorizeIndexAction$4(RBACEngine.java:273) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.authorizeIndexActionName(RBACEngine.java:297) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.authorizeIndexAction(RBACEngine.java:270) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.authorizeAction(AuthorizationService.java:261) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.maybeAuthorizeRunAs(AuthorizationService.java:227) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorize$1(AuthorizationService.java:193) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.lambda$resolveAuthorizationInfo$1(RBACEngine.java:113) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.getRoles(CompositeRolesStore.java:285) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.getRoles(RBACEngine.java:119) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.resolveAuthorizationInfo(RBACEngine.java:107) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.authorize(AuthorizationService.java:195) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.lambda$inbound$1(ServerTransportFilter.java:150) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$2(AuthenticationService.java:245) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$lookForExistingAuthentication$6(AuthenticationService.java:305) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lookForExistingAuthentication(AuthenticationService.java:316) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.authenticateAsync(AuthenticationService.java:243) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.access$000(AuthenticationService.java:195) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:138) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.inbound(ServerTransportFilter.java:133) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:306) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1087) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.7.2.jar:6.7.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]

ILM will leave an index at the forcemerge action's segment-count step, waiting for the shards to merge.

      "step_info" : {
        "message" : "Waiting for [1] shards to forcemerge",
        "shards_left_to_merge" : 1
      }

However, the segment-count step does not have any knowledge of whether there is still an outstanding force merge operation running against the index. It does not currently retry forcemerge so it will just keep waiting in segment-count until either 1) the user runs force merge outside of ILM to complete the force merge, 2) the user instructs ILM to re-run force merge by manually moving the step back to forcemerge.

Not able to rollover even after resolving read-only/allow delete block due to flood stage watermark.

If the node has previously hit the flood stage watermark, after the admin has addressed the disk usage and removed the read-only/allow delete block against the affected indices, it may not occur to them that they will also have to manually issue a ILM retry against the index that couldn't rollover before due to the block. If the admin has removed the block against the index but not manually reissued a retry in ILM against the index, indexing will keep writing to the latest rollover index beyond max_size. As a result, the cluster can end up getting an index that is hundreds of Gbs with shards that are way over 100Gb each, causing other issues.

      "step_info" : {
        "type" : "cluster_block_exception",
        "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];",
        "stack_trace" : "ClusterBlockException[blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];]\n\tat org.elasticsearch.cluster.block.ClusterBlocks.indicesBlockedException(ClusterBlocks.java:229)\n\tat org.elasticsearch.action.admin.indices.rollover.TransportRolloverAction.checkBlock(TransportRolloverAction.java:103)\n\tat org.elasticsearch.action.admin.indices.rollover.TransportRolloverAction.checkBlock(TransportRolloverAction.java:67)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.doStart(TransportMasterNodeAction.java:173)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.start(TransportMasterNodeAction.java:164)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction.doExecute(TransportMasterNodeAction.java:141)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction.doExecute(TransportMasterNodeAction.java:59)\n\tat org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167)\n\tat org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$apply$0(SecurityActionFilter.java:84)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$authorizeRequest$4(SecurityActionFilter.java:169)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$runRequestInterceptors$15(AuthorizationService.java:344)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$notifyListener$1(ListenableFuture.java:97)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:192)\n\tat java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:92)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:84)\n\tat java.base/java.util.ArrayList.forEach(ArrayList.java:1540)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:84)\n\tat org.elasticsearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:143)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.onResponse(ListenableFuture.java:109)\n\tat org.elasticsearch.action.StepListener.onResponse(StepListener.java:62)\n\tat org.elasticsearch.xpack.security.authz.interceptor.ResizeRequestInterceptor.intercept(ResizeRequestInterceptor.java:82)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$runRequestInterceptors$14(AuthorizationService.java:339)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$notifyListener$1(ListenableFuture.java:97)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:192)\n\tat java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:92)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:84)\n\tat java.base/java.util.ArrayList.forEach(ArrayList.java:1540)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:84)\n\tat org.elasticsearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:143)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.onResponse(ListenableFuture.java:109)\n\tat org.elasticsearch.action.StepListener.onResponse(StepListener.java:62)\n\tat org.elasticsearch.xpack.security.authz.interceptor.IndicesAliasesRequestInterceptor.intercept(IndicesAliasesRequestInterceptor.java:102)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.runRequestInterceptors(AuthorizationService.java:345)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.handleIndexActionAuthorizationResult(AuthorizationService.java:322)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$9(AuthorizationService.java:263)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService$AuthorizationResultListener.onResponse(AuthorizationService.java:604)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService$AuthorizationResultListener.onResponse(AuthorizationService.java:579)\n\tat org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.buildIndicesAccessControl(RBACEngine.java:488)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.lambda$authorizeIndexAction$3(RBACEngine.java:281)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.lambda$getAsync$0(AuthorizationService.java:641)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.resolveIndexNames(AuthorizationService.java:550)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$6(AuthorizationService.java:251)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.lambda$getAsync$0(AuthorizationService.java:641)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.loadAuthorizedIndices(RBACEngine.java:312)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$5(AuthorizationService.java:247)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.getAsync(AuthorizationService.java:639)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$8(AuthorizationService.java:250)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.getAsync(AuthorizationService.java:639)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.lambda$authorizeIndexAction$4(RBACEngine.java:273)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.authorizeIndexActionName(RBACEngine.java:297)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.authorizeIndexAction(RBACEngine.java:270)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.authorizeAction(AuthorizationService.java:261)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.maybeAuthorizeRunAs(AuthorizationService.java:227)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorize$1(AuthorizationService.java:193)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.lambda$resolveAuthorizationInfo$1(RBACEngine.java:113)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.getRoles(CompositeRolesStore.java:285)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.getRoles(RBACEngine.java:119)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.resolveAuthorizationInfo(RBACEngine.java:107)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.authorize(AuthorizationService.java:195)\n\tat org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.authorizeRequest(SecurityActionFilter.java:169)\n\tat org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$applyInternal$3(SecurityActionFilter.java:155)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$2(AuthenticationService.java:245)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$lookForExistingAuthentication$6(AuthenticationService.java:305)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lookForExistingAuthentication(AuthenticationService.java:316)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.authenticateAsync(AuthenticationService.java:243)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.access$000(AuthenticationService.java:195)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:138)\n\tat org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.applyInternal(SecurityActionFilter.java:152)\n\tat org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:105)\n\tat org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165)\n\tat org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139)\n\tat org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81)\n\tat org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:87)\n\tat org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:76)\n\tat org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403)\n\tat org.elasticsearch.xpack.core.ClientHelper.executeWithHeadersAsync(ClientHelper.java:157)\n\tat org.elasticsearch.xpack.indexlifecycle.LifecyclePolicySecurityClient.doExecute(LifecyclePolicySecurityClient.java:55)\n\tat org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403)\n\tat org.elasticsearch.client.support.AbstractClient$IndicesAdmin.execute(AbstractClient.java:1269)\n\tat org.elasticsearch.client.support.AbstractClient$IndicesAdmin.rolloverIndex(AbstractClient.java:1777)\n\tat org.elasticsearch.xpack.core.indexlifecycle.WaitForRolloverReadyStep.evaluateCondition(WaitForRolloverReadyStep.java:115)\n\tat org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunner.runPeriodicStep(IndexLifecycleRunner.java:133)\n\tat org.elasticsearch.xpack.indexlifecycle.IndexLifecycleService.triggerPolicies(IndexLifecycleService.java:270)\n\tat org.elasticsearch.xpack.indexlifecycle.IndexLifecycleService.triggered(IndexLifecycleService.java:213)\n\tat org.elasticsearch.xpack.core.scheduler.SchedulerEngine.notifyListeners(SchedulerEngine.java:168)\n\tat org.elasticsearch.xpack.core.scheduler.SchedulerEngine$ActiveSchedule.run(SchedulerEngine.java:196)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\n"
      }

It can be helpful to add a note to https://www.elastic.co/guide/en/elasticsearch/reference/current/disk-allocator.html#disk-allocator as part of the example to remove the block to remind admins to check ILM to see if they need to issue a manual retry. Though it will be better if ILM can periodically retry so that it will reset itself after the block is cleared against the index.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-06-03T23:29:58Z

Pinging @elastic/es-core-features

nachiket-lab · 2019-06-11T10:06:17Z

This seems to be the same issue we are facing in our prod environment.

It's possible for force merges kicked off by ILM to silently stop (due to a node relocating for example). In which case, the segment count may not reach what the user configured. In the subsequent `SegmentCountStep` waiting for the expected segment count may wait indefinitely. Because of this, this commit makes force merges "best effort" and then changes the `SegmentCountStep` to simply report (at INFO level) if the merge was not successful. Relates to elastic#42824 Resolves elastic#43245

It's possible for force merges kicked off by ILM to silently stop (due to a node relocating for example). In which case, the segment count may not reach what the user configured. In the subsequent `SegmentCountStep` waiting for the expected segment count may wait indefinitely. Because of this, this commit makes force merges "best effort" and then changes the `SegmentCountStep` to simply report (at INFO level) if the merge was not successful. Relates to #42824 Resolves #43245

AntonFriberg · 2020-01-02T13:22:53Z

I ran into this when running out of disk space on our ECE instance. While it was easy to expand the nodes it was very user-hostile to make me manually trigger a retry on my 28 failed indexes that have the same ILM policy configured.

jasontedor · 2020-01-05T22:43:09Z

While it was easy to expand the nodes it was very user-hostile to make me manually trigger a retry on my 28 failed indexes that have the same ILM policy configured.

We are sorry about the poor experience that you had here. We have recognized this and problems like this are serious usability issues. We have been making a concerted effort in our system to make Elasticsearch more resilient in the face of errors in a way that requires less intervention from a human: we think when the system can recover on its own, it should. ILM is one area in particular where we are investing heavily and making the system more resilient to errors so the system automatically recovers.

AntonFriberg · 2020-01-06T07:44:12Z

While it was easy to expand the nodes it was very user-hostile to make me manually trigger a retry on my 28 failed indexes that have the same ILM policy configured.

We are sorry about the poor experience that you had here. We have recognized this and problems like this are serious usability issues. We have been making a concerted effort in our system to make Elasticsearch more resilient in the face of errors in a way that requires less intervention from a human: we think when the system can recover on its own, it should. ILM is one area in particular where we are investing heavily and making the system more resilient to errors so the system automatically recovers.

Thank you! I will make sure to forward this information to my team.

jasontedor · 2020-01-06T19:52:53Z

Thank you! I will make sure to forward this information to my team.

Here’s an issue that you can use to track our progress on this work specific to ILM: #48183

dakrone · 2020-12-09T17:16:15Z

Closing this in favor of #48183, where we will track the work for this.

gaocx2000cn · 2020-12-23T01:35:22Z

About 'Make ILM force merging best effort (#43246)'

There are cluster with 3 shards on 3 data node, forcemerge with max_num_segments=1 against Elasticsearch7.0.1 cluster will spend twice as much time as Elasticsearch6.8.13 cluster.
Is this a bug?

dakrone · 2021-01-04T19:48:41Z

@gaocx2000cn force merging should take roughly the same amount of time, there is no functional difference in force merging in those cases. The only difference would be Lucene version.

ppf2 added the :Data Management/ILM+SLM Index and Snapshot lifecycle management label Jun 3, 2019

dakrone mentioned this issue Jun 14, 2019

Make ILM force merging best effort #43246

Merged

ppf2 mentioned this issue Jul 9, 2019

Automatic retries for ILM rollover action #44135

Closed

andreidan mentioned this issue Oct 17, 2019

Retry ILM steps when transient or recoverable errors are encountered #48183

Closed

31 tasks

rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020

dakrone closed this as completed Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit ILM retry strategy for additional conditions #42824

Revisit ILM retry strategy for additional conditions #42824

ppf2 commented Jun 3, 2019 •

edited

Loading

elasticmachine commented Jun 3, 2019

nachiket-lab commented Jun 11, 2019

AntonFriberg commented Jan 2, 2020

jasontedor commented Jan 5, 2020

AntonFriberg commented Jan 6, 2020

jasontedor commented Jan 6, 2020

dakrone commented Dec 9, 2020

gaocx2000cn commented Dec 23, 2020

dakrone commented Jan 4, 2021

Revisit ILM retry strategy for additional conditions #42824

Revisit ILM retry strategy for additional conditions #42824

Comments

ppf2 commented Jun 3, 2019 • edited Loading

elasticmachine commented Jun 3, 2019

nachiket-lab commented Jun 11, 2019

AntonFriberg commented Jan 2, 2020

jasontedor commented Jan 5, 2020

AntonFriberg commented Jan 6, 2020

jasontedor commented Jan 6, 2020

dakrone commented Dec 9, 2020

gaocx2000cn commented Dec 23, 2020

dakrone commented Jan 4, 2021

ppf2 commented Jun 3, 2019 •

edited

Loading