Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit ILM retry strategy for additional conditions #42824

Closed
ppf2 opened this issue Jun 3, 2019 · 9 comments
Closed

Revisit ILM retry strategy for additional conditions #42824

ppf2 opened this issue Jun 3, 2019 · 9 comments
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management Team:Data Management Meta label for data/management team

Comments

@ppf2
Copy link
Member

ppf2 commented Jun 3, 2019

Currently, ILM does not retry on most step errors other than SnapshotInProgressException.

The following are a few scenarios users have run into in the field where having a retry strategy for other errors or conditions will be helpful:

  1. Incomplete force merge due to the underlying shard being relocated:
[2019-06-02T06:58:24,171][TRACE][o.e.a.a.i.f.TransportForceMergeAction] [node1] [indices:admin/forcemerge]  executing operation for shard [[shrink-logstash-app1-2019.06.02-000058][0], node[n8R9j8EfRD-C1Y1ipWafcA], relocating [pANpiuX9RiyfemObVUVYNA], [P], s[RELOCATING], a[id=C6yoSsM2T4CyIn0HljR67g, rId=eWz6qi4-QzSRStF3YCMn0w], expected_shard_size[53668519555]]
[2019-06-02T06:58:24,195][TRACE][o.e.a.a.i.f.TransportForceMergeAction] [node1] [indices:admin/forcemerge] failed to execute operation for shard [[shrink-logstash-app1-2019.06.02-000058][0], node[n8R9j8EfRD-C1Y1ipWafcA], relocating [pANpiuX9RiyfemObVUVYNA], [P], s[RELOCATING], a[id=C6yoSsM2T4CyIn0HljR67g, rId=eWz6qi4-QzSRStF3YCMn0w], expected_shard_size[53668519555]]
org.elasticsearch.index.shard.ShardNotFoundException: no such shard
	at org.elasticsearch.index.IndexService.getShard(IndexService.java:236) ~[elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.admin.indices.forcemerge.TransportForceMergeAction.shardOperation(TransportForceMergeAction.java:81) ~[elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.admin.indices.forcemerge.TransportForceMergeAction.shardOperation(TransportForceMergeAction.java:46) ~[elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:436) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:414) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:401) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:250) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:192) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.lambda$messageReceived$0(SecurityServerTransportInterceptor.java:299) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$runRequestInterceptors$15(AuthorizationService.java:344) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$notifyListener$1(ListenableFuture.java:97) [elasticsearch-6.7.2.jar:6.7.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:192) [elasticsearch-6.7.2.jar:6.7.2]
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118) [?:?]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:92) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:84) [elasticsearch-6.7.2.jar:6.7.2]
	at java.util.ArrayList.forEach(ArrayList.java:1540) [?:?]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:84) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:143) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.onResponse(ListenableFuture.java:109) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.StepListener.onResponse(StepListener.java:62) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.interceptor.ResizeRequestInterceptor.intercept(ResizeRequestInterceptor.java:82) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$runRequestInterceptors$14(AuthorizationService.java:339) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$notifyListener$1(ListenableFuture.java:97) [elasticsearch-6.7.2.jar:6.7.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:192) [elasticsearch-6.7.2.jar:6.7.2]
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118) [?:?]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:92) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:84) [elasticsearch-6.7.2.jar:6.7.2]
	at java.util.ArrayList.forEach(ArrayList.java:1540) [?:?]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:84) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:143) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.ListenableFuture.onResponse(ListenableFuture.java:109) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.StepListener.onResponse(StepListener.java:62) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.interceptor.IndicesAliasesRequestInterceptor.intercept(IndicesAliasesRequestInterceptor.java:102) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.runRequestInterceptors(AuthorizationService.java:345) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.handleIndexActionAuthorizationResult(AuthorizationService.java:322) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$9(AuthorizationService.java:263) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService$AuthorizationResultListener.onResponse(AuthorizationService.java:604) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService$AuthorizationResultListener.onResponse(AuthorizationService.java:579) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.buildIndicesAccessControl(RBACEngine.java:488) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.lambda$authorizeIndexAction$3(RBACEngine.java:281) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.lambda$getAsync$0(AuthorizationService.java:641) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.resolveIndexNames(AuthorizationService.java:550) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$6(AuthorizationService.java:251) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.lambda$getAsync$0(AuthorizationService.java:641) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.loadAuthorizedIndices(RBACEngine.java:312) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$5(AuthorizationService.java:247) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.getAsync(AuthorizationService.java:639) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$8(AuthorizationService.java:250) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.getAsync(AuthorizationService.java:639) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.lambda$authorizeIndexAction$4(RBACEngine.java:273) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.authorizeIndexActionName(RBACEngine.java:297) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.authorizeIndexAction(RBACEngine.java:270) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.authorizeAction(AuthorizationService.java:261) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.maybeAuthorizeRunAs(AuthorizationService.java:227) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorize$1(AuthorizationService.java:193) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.lambda$resolveAuthorizationInfo$1(RBACEngine.java:113) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.getRoles(CompositeRolesStore.java:285) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.getRoles(RBACEngine.java:119) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.RBACEngine.resolveAuthorizationInfo(RBACEngine.java:107) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authz.AuthorizationService.authorize(AuthorizationService.java:195) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.lambda$inbound$1(ServerTransportFilter.java:150) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$2(AuthenticationService.java:245) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$lookForExistingAuthentication$6(AuthenticationService.java:305) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lookForExistingAuthentication(AuthenticationService.java:316) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.authenticateAsync(AuthenticationService.java:243) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.access$000(AuthenticationService.java:195) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:138) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.inbound(ServerTransportFilter.java:133) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:306) [x-pack-security-6.7.2.jar:6.7.2]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1087) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-6.7.2.jar:6.7.2]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.7.2.jar:6.7.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]

ILM will leave an index at the forcemerge action's segment-count step, waiting for the shards to merge.

      "step_info" : {
        "message" : "Waiting for [1] shards to forcemerge",
        "shards_left_to_merge" : 1
      }

However, the segment-count step does not have any knowledge of whether there is still an outstanding force merge operation running against the index. It does not currently retry forcemerge so it will just keep waiting in segment-count until either 1) the user runs force merge outside of ILM to complete the force merge, 2) the user instructs ILM to re-run force merge by manually moving the step back to forcemerge.

  1. Not able to rollover even after resolving read-only/allow delete block due to flood stage watermark.

If the node has previously hit the flood stage watermark, after the admin has addressed the disk usage and removed the read-only/allow delete block against the affected indices, it may not occur to them that they will also have to manually issue a ILM retry against the index that couldn't rollover before due to the block. If the admin has removed the block against the index but not manually reissued a retry in ILM against the index, indexing will keep writing to the latest rollover index beyond max_size. As a result, the cluster can end up getting an index that is hundreds of Gbs with shards that are way over 100Gb each, causing other issues.

      "step_info" : {
        "type" : "cluster_block_exception",
        "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];",
        "stack_trace" : "ClusterBlockException[blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];]\n\tat org.elasticsearch.cluster.block.ClusterBlocks.indicesBlockedException(ClusterBlocks.java:229)\n\tat org.elasticsearch.action.admin.indices.rollover.TransportRolloverAction.checkBlock(TransportRolloverAction.java:103)\n\tat org.elasticsearch.action.admin.indices.rollover.TransportRolloverAction.checkBlock(TransportRolloverAction.java:67)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.doStart(TransportMasterNodeAction.java:173)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.start(TransportMasterNodeAction.java:164)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction.doExecute(TransportMasterNodeAction.java:141)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction.doExecute(TransportMasterNodeAction.java:59)\n\tat org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167)\n\tat org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$apply$0(SecurityActionFilter.java:84)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$authorizeRequest$4(SecurityActionFilter.java:169)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$runRequestInterceptors$15(AuthorizationService.java:344)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$notifyListener$1(ListenableFuture.java:97)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:192)\n\tat java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:92)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:84)\n\tat java.base/java.util.ArrayList.forEach(ArrayList.java:1540)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:84)\n\tat org.elasticsearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:143)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.onResponse(ListenableFuture.java:109)\n\tat org.elasticsearch.action.StepListener.onResponse(StepListener.java:62)\n\tat org.elasticsearch.xpack.security.authz.interceptor.ResizeRequestInterceptor.intercept(ResizeRequestInterceptor.java:82)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$runRequestInterceptors$14(AuthorizationService.java:339)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$notifyListener$1(ListenableFuture.java:97)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:192)\n\tat java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:92)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:84)\n\tat java.base/java.util.ArrayList.forEach(ArrayList.java:1540)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:84)\n\tat org.elasticsearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:143)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.onResponse(ListenableFuture.java:109)\n\tat org.elasticsearch.action.StepListener.onResponse(StepListener.java:62)\n\tat org.elasticsearch.xpack.security.authz.interceptor.IndicesAliasesRequestInterceptor.intercept(IndicesAliasesRequestInterceptor.java:102)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.runRequestInterceptors(AuthorizationService.java:345)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.handleIndexActionAuthorizationResult(AuthorizationService.java:322)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$9(AuthorizationService.java:263)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService$AuthorizationResultListener.onResponse(AuthorizationService.java:604)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService$AuthorizationResultListener.onResponse(AuthorizationService.java:579)\n\tat org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.buildIndicesAccessControl(RBACEngine.java:488)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.lambda$authorizeIndexAction$3(RBACEngine.java:281)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.lambda$getAsync$0(AuthorizationService.java:641)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.resolveIndexNames(AuthorizationService.java:550)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$6(AuthorizationService.java:251)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.lambda$getAsync$0(AuthorizationService.java:641)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.loadAuthorizedIndices(RBACEngine.java:312)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$5(AuthorizationService.java:247)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.getAsync(AuthorizationService.java:639)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$8(AuthorizationService.java:250)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.getAsync(AuthorizationService.java:639)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.lambda$authorizeIndexAction$4(RBACEngine.java:273)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.authorizeIndexActionName(RBACEngine.java:297)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.authorizeIndexAction(RBACEngine.java:270)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.authorizeAction(AuthorizationService.java:261)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.maybeAuthorizeRunAs(AuthorizationService.java:227)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorize$1(AuthorizationService.java:193)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.lambda$resolveAuthorizationInfo$1(RBACEngine.java:113)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.getRoles(CompositeRolesStore.java:285)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.getRoles(RBACEngine.java:119)\n\tat org.elasticsearch.xpack.security.authz.RBACEngine.resolveAuthorizationInfo(RBACEngine.java:107)\n\tat org.elasticsearch.xpack.security.authz.AuthorizationService.authorize(AuthorizationService.java:195)\n\tat org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.authorizeRequest(SecurityActionFilter.java:169)\n\tat org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$applyInternal$3(SecurityActionFilter.java:155)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$2(AuthenticationService.java:245)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$lookForExistingAuthentication$6(AuthenticationService.java:305)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lookForExistingAuthentication(AuthenticationService.java:316)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.authenticateAsync(AuthenticationService.java:243)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.access$000(AuthenticationService.java:195)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:138)\n\tat org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.applyInternal(SecurityActionFilter.java:152)\n\tat org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:105)\n\tat org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165)\n\tat org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139)\n\tat org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81)\n\tat org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:87)\n\tat org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:76)\n\tat org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403)\n\tat org.elasticsearch.xpack.core.ClientHelper.executeWithHeadersAsync(ClientHelper.java:157)\n\tat org.elasticsearch.xpack.indexlifecycle.LifecyclePolicySecurityClient.doExecute(LifecyclePolicySecurityClient.java:55)\n\tat org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403)\n\tat org.elasticsearch.client.support.AbstractClient$IndicesAdmin.execute(AbstractClient.java:1269)\n\tat org.elasticsearch.client.support.AbstractClient$IndicesAdmin.rolloverIndex(AbstractClient.java:1777)\n\tat org.elasticsearch.xpack.core.indexlifecycle.WaitForRolloverReadyStep.evaluateCondition(WaitForRolloverReadyStep.java:115)\n\tat org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunner.runPeriodicStep(IndexLifecycleRunner.java:133)\n\tat org.elasticsearch.xpack.indexlifecycle.IndexLifecycleService.triggerPolicies(IndexLifecycleService.java:270)\n\tat org.elasticsearch.xpack.indexlifecycle.IndexLifecycleService.triggered(IndexLifecycleService.java:213)\n\tat org.elasticsearch.xpack.core.scheduler.SchedulerEngine.notifyListeners(SchedulerEngine.java:168)\n\tat org.elasticsearch.xpack.core.scheduler.SchedulerEngine$ActiveSchedule.run(SchedulerEngine.java:196)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\n"
      }

It can be helpful to add a note to https://www.elastic.co/guide/en/elasticsearch/reference/current/disk-allocator.html#disk-allocator as part of the example to remove the block to remind admins to check ILM to see if they need to issue a manual retry. Though it will be better if ILM can periodically retry so that it will reset itself after the block is cleared against the index.

@ppf2 ppf2 added the :Data Management/ILM+SLM Index and Snapshot lifecycle management label Jun 3, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@nachiket-lab
Copy link

This seems to be the same issue we are facing in our prod environment.

dakrone added a commit to dakrone/elasticsearch that referenced this issue Jun 14, 2019
It's possible for force merges kicked off by ILM to silently stop (due
to a node relocating for example). In which case, the segment count may
not reach what the user configured. In the subsequent `SegmentCountStep`
waiting for the expected segment count may wait indefinitely. Because of
this, this commit makes force merges "best effort" and then changes the
`SegmentCountStep` to simply report (at INFO level) if the merge was not
successful.

Relates to elastic#42824
Resolves elastic#43245
dakrone added a commit that referenced this issue Jun 17, 2019
It's possible for force merges kicked off by ILM to silently stop (due
to a node relocating for example). In which case, the segment count may
not reach what the user configured. In the subsequent `SegmentCountStep`
waiting for the expected segment count may wait indefinitely. Because of
this, this commit makes force merges "best effort" and then changes the
`SegmentCountStep` to simply report (at INFO level) if the merge was not
successful.

Relates to #42824
Resolves #43245
dakrone added a commit that referenced this issue Jun 17, 2019
It's possible for force merges kicked off by ILM to silently stop (due
to a node relocating for example). In which case, the segment count may
not reach what the user configured. In the subsequent `SegmentCountStep`
waiting for the expected segment count may wait indefinitely. Because of
this, this commit makes force merges "best effort" and then changes the
`SegmentCountStep` to simply report (at INFO level) if the merge was not
successful.

Relates to #42824
Resolves #43245
dakrone added a commit that referenced this issue Jun 17, 2019
It's possible for force merges kicked off by ILM to silently stop (due
to a node relocating for example). In which case, the segment count may
not reach what the user configured. In the subsequent `SegmentCountStep`
waiting for the expected segment count may wait indefinitely. Because of
this, this commit makes force merges "best effort" and then changes the
`SegmentCountStep` to simply report (at INFO level) if the merge was not
successful.

Relates to #42824
Resolves #43245
dakrone added a commit that referenced this issue Jun 17, 2019
It's possible for force merges kicked off by ILM to silently stop (due
to a node relocating for example). In which case, the segment count may
not reach what the user configured. In the subsequent `SegmentCountStep`
waiting for the expected segment count may wait indefinitely. Because of
this, this commit makes force merges "best effort" and then changes the
`SegmentCountStep` to simply report (at INFO level) if the merge was not
successful.

Relates to #42824
Resolves #43245
dakrone added a commit that referenced this issue Jun 17, 2019
It's possible for force merges kicked off by ILM to silently stop (due
to a node relocating for example). In which case, the segment count may
not reach what the user configured. In the subsequent `SegmentCountStep`
waiting for the expected segment count may wait indefinitely. Because of
this, this commit makes force merges "best effort" and then changes the
`SegmentCountStep` to simply report (at INFO level) if the merge was not
successful.

Relates to #42824
Resolves #43245
@AntonFriberg
Copy link

I ran into this when running out of disk space on our ECE instance. While it was easy to expand the nodes it was very user-hostile to make me manually trigger a retry on my 28 failed indexes that have the same ILM policy configured.

@jasontedor
Copy link
Member

While it was easy to expand the nodes it was very user-hostile to make me manually trigger a retry on my 28 failed indexes that have the same ILM policy configured.

We are sorry about the poor experience that you had here. We have recognized this and problems like this are serious usability issues. We have been making a concerted effort in our system to make Elasticsearch more resilient in the face of errors in a way that requires less intervention from a human: we think when the system can recover on its own, it should. ILM is one area in particular where we are investing heavily and making the system more resilient to errors so the system automatically recovers.

@AntonFriberg
Copy link

While it was easy to expand the nodes it was very user-hostile to make me manually trigger a retry on my 28 failed indexes that have the same ILM policy configured.

We are sorry about the poor experience that you had here. We have recognized this and problems like this are serious usability issues. We have been making a concerted effort in our system to make Elasticsearch more resilient in the face of errors in a way that requires less intervention from a human: we think when the system can recover on its own, it should. ILM is one area in particular where we are investing heavily and making the system more resilient to errors so the system automatically recovers.

Thank you! I will make sure to forward this information to my team.

@jasontedor
Copy link
Member

Thank you! I will make sure to forward this information to my team.

Here’s an issue that you can use to track our progress on this work specific to ILM: #48183

@rjernst rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020
@dakrone
Copy link
Member

dakrone commented Dec 9, 2020

Closing this in favor of #48183, where we will track the work for this.

@dakrone dakrone closed this as completed Dec 9, 2020
@gaocx2000cn
Copy link

About 'Make ILM force merging best effort (#43246)'

There are cluster with 3 shards on 3 data node, forcemerge with max_num_segments=1 against Elasticsearch7.0.1 cluster will spend twice as much time as Elasticsearch6.8.13 cluster.
Is this a bug?

@dakrone
Copy link
Member

dakrone commented Jan 4, 2021

@gaocx2000cn force merging should take roughly the same amount of time, there is no functional difference in force merging in those cases. The only difference would be Lucene version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

8 participants