Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] TasksIT testGetTaskWaitForCompletionWithoutStoringResult failing #107823

Closed
ldematte opened this issue Apr 24, 2024 · 2 comments · Fixed by #108094
Closed

[CI] TasksIT testGetTaskWaitForCompletionWithoutStoringResult failing #107823

ldematte opened this issue Apr 24, 2024 · 2 comments · Fixed by #108094
Assignees
Labels
:Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. low-risk An open issue or test failure that is a low risk to future releases Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@ldematte
Copy link
Contributor

Build scan:
https://gradle-enterprise.elastic.co/s/hjumq2svctopc/tests/:server:internalClusterTest/org.elasticsearch.action.admin.cluster.node.tasks.TasksIT/testGetTaskWaitForCompletionWithoutStoringResult

Reproduction line:

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.action.admin.cluster.node.tasks.TasksIT.testGetTaskWaitForCompletionWithoutStoringResult" -Dtests.seed=C8388D030155DCAB -Dtests.locale=de-DE -Dtests.timezone=Indian/Antananarivo -Druntime.java=21

Applicable branches:
8.14

Reproduces locally?:
Didn't try

Failure history:
Failure dashboard for org.elasticsearch.action.admin.cluster.node.tasks.TasksIT#testGetTaskWaitForCompletionWithoutStoringResult

Failure excerpt:

java.util.concurrent.ExecutionException: org.elasticsearch.transport.RemoteTransportException: [node_s1][127.0.0.1:21132][cluster:monitor/task/get]

  at __randomizedtesting.SeedInfo.seed([C8388D030155DCAB:AE69DE381328CB67]:0)
  at org.elasticsearch.action.support.PlainActionFuture$Sync.getValue(PlainActionFuture.java:287)
  at org.elasticsearch.action.support.PlainActionFuture$Sync.get(PlainActionFuture.java:274)
  at org.elasticsearch.action.support.PlainActionFuture.get(PlainActionFuture.java:93)
  at org.elasticsearch.client.internal.support.AbstractClient$RefCountedFuture.get(AbstractClient.java:1535)
  at org.elasticsearch.client.internal.support.AbstractClient$RefCountedFuture.get(AbstractClient.java:1515)
  at org.elasticsearch.action.admin.cluster.node.tasks.TasksIT.waitForCompletionTestCase(TasksIT.java:617)
  at org.elasticsearch.action.admin.cluster.node.tasks.TasksIT.testGetTaskWaitForCompletionWithoutStoringResult(TasksIT.java:565)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)

  Caused by: org.elasticsearch.transport.RemoteTransportException: [node_s1][127.0.0.1:21132][cluster:monitor/task/get]


    Caused by: org.elasticsearch.ResourceNotFoundException: task [q0H4oaguRdaeStczQy8MZQ:169] isn't running and hasn't stored its results

      at org.elasticsearch.action.admin.cluster.node.tasks.get.TransportGetTaskAction.lambda$getFinishedTaskFromIndex$6(TransportGetTaskAction.java:216)
      at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62)
      at org.elasticsearch.action.ActionListener$2.onFailure(ActionListener.java:179)
      at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62)
      at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73)
      at org.elasticsearch.action.DelegatingActionListener.onFailure(DelegatingActionListener.java:31)
      at org.elasticsearch.action.support.ContextPreservingActionListener.onFailure(ContextPreservingActionListener.java:39)
      at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62)
      at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73)
      at org.elasticsearch.action.ActionListener$3.onFailure(ActionListener.java:324)
      at org.elasticsearch.tasks.TaskManager$1.onFailure(TaskManager.java:214)
      at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62)
      at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73)
      at org.elasticsearch.action.DelegatingActionListener.onFailure(DelegatingActionListener.java:31)
      at org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener.onFailure(ActionListenerImplementations.java:317)
      at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62)
      at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73)
      at org.elasticsearch.action.ActionListener$3.onFailure(ActionListener.java:324)
      at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:103)
      at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:68)
      at org.elasticsearch.tasks.TaskManager.registerAndExecute(TaskManager.java:196)
      at org.elasticsearch.client.internal.node.NodeClient.executeLocally(NodeClient.java:105)
      at org.elasticsearch.client.internal.node.NodeClient.doExecute(NodeClient.java:83)
      at org.elasticsearch.client.internal.support.AbstractClient.execute(AbstractClient.java:357)
      at org.elasticsearch.client.internal.FilterClient.doExecute(FilterClient.java:55)
      at org.elasticsearch.client.internal.OriginSettingClient.doExecute(OriginSettingClient.java:43)
      at org.elasticsearch.client.internal.support.AbstractClient.execute(AbstractClient.java:357)
      at org.elasticsearch.client.internal.support.AbstractClient.get(AbstractClient.java:457)
      at org.elasticsearch.action.admin.cluster.node.tasks.get.TransportGetTaskAction.getFinishedTaskFromIndex(TransportGetTaskAction.java:212)
      at org.elasticsearch.action.admin.cluster.node.tasks.get.TransportGetTaskAction.getRunningTaskFromNode(TransportGetTaskAction.java:140)
      at org.elasticsearch.action.admin.cluster.node.tasks.get.TransportGetTaskAction.doExecute(TransportGetTaskAction.java:92)
      at org.elasticsearch.action.admin.cluster.node.tasks.get.TransportGetTaskAction.doExecute(TransportGetTaskAction.java:60)
      at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:96)
      at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:68)
      at org.elasticsearch.action.support.HandledTransportAction.lambda$new$0(HandledTransportAction.java:50)
      at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
      at org.elasticsearch.transport.InboundHandler.doHandleRequest(InboundHandler.java:288)
      at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:273)
      at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:115)
      at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:96)
      at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:821)
      at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:124)
      at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:96)
      at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:61)
      at org.elasticsearch.transport.netty4.Netty4MessageInboundHandler.channelRead(Netty4MessageInboundHandler.java:48)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
      at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
      at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
      at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
      at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
      at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
      at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689)
      at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
      at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
      at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
      at java.lang.Thread.run(Thread.java:1583)

      Caused by: org.elasticsearch.index.IndexNotFoundException: no such index [.tasks]

        at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.notFoundException(IndexNameExpressionResolver.java:555)
        at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$ExplicitResourceNameFilter.ensureAliasOrIndexExists(IndexNameExpressionResolver.java:1718)
        at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$ExplicitResourceNameFilter.filterUnavailable(IndexNameExpressionResolver.java:1698)
        at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.resolveExpressions(IndexNameExpressionResolver.java:252)
        at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:340)
        at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:299)
        at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:285)
        at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteSingleIndex(IndexNameExpressionResolver.java:634)
        at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.<init>(TransportSingleShardAction.java:161)
        at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.doExecute(TransportSingleShardAction.java:106)
        at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.doExecute(TransportSingleShardAction.java:53)
        at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:96)
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:68)
        at org.elasticsearch.tasks.TaskManager.registerAndExecute(TaskManager.java:196)
        at org.elasticsearch.client.internal.node.NodeClient.executeLocally(NodeClient.java:105)
        at org.elasticsearch.client.internal.node.NodeClient.doExecute(NodeClient.java:83)
        at org.elasticsearch.client.internal.support.AbstractClient.execute(AbstractClient.java:357)
        at org.elasticsearch.client.internal.FilterClient.doExecute(FilterClient.java:55)
        at org.elasticsearch.client.internal.OriginSettingClient.doExecute(OriginSettingClient.java:43)
        at org.elasticsearch.client.internal.support.AbstractClient.execute(AbstractClient.java:357)
        at org.elasticsearch.client.internal.support.AbstractClient.get(AbstractClient.java:457)
        at org.elasticsearch.action.admin.cluster.node.tasks.get.TransportGetTaskAction.getFinishedTaskFromIndex(TransportGetTaskAction.java:212)
        at org.elasticsearch.action.admin.cluster.node.tasks.get.TransportGetTaskAction.getRunningTaskFromNode(TransportGetTaskAction.java:140)
        at org.elasticsearch.action.admin.cluster.node.tasks.get.TransportGetTaskAction.doExecute(TransportGetTaskAction.java:92)
        at org.elasticsearch.action.admin.cluster.node.tasks.get.TransportGetTaskAction.doExecute(TransportGetTaskAction.java:60)
        at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:96)
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:68)
        at org.elasticsearch.action.support.HandledTransportAction.lambda$new$0(HandledTransportAction.java:50)
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
        at org.elasticsearch.transport.InboundHandler.doHandleRequest(InboundHandler.java:288)
        at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:273)
        at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:115)
        at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:96)
        at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:821)
        at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:124)
        at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:96)
        at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:61)
        at org.elasticsearch.transport.netty4.Netty4MessageInboundHandler.channelRead(Netty4MessageInboundHandler.java:48)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.lang.Thread.run(Thread.java:1583)

@ldematte ldematte added :Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. >test-failure Triaged test failures from CI labels Apr 24, 2024
@elasticsearchmachine elasticsearchmachine added blocker Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels Apr 24, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the needs:risk Requires assignment of a risk label (low, medium, blocker) label Apr 24, 2024
@ywangd ywangd added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Apr 25, 2024
@ywangd
Copy link
Member

ywangd commented Apr 25, 2024

The .tasks index got deleted while a request is still trying to access it. Quite possibly a test issue. Changed to low-risk.

@arteam arteam self-assigned this Apr 26, 2024
arteam added a commit that referenced this issue Apr 30, 2024
Make sure the `.tasks` index is created before we starting testing task completion
without storing its result. To achieve that, we store a fake task before we start
`waitForCompletionTestCase`.

Resolves #107823
arteam added a commit that referenced this issue May 29, 2024
It seems that the failure (the missed index) has always existed in the test scenario and it's supposed to be handled by TransportGetTaskAction.java. We catch IndexNotFoundException here and convert it to ResourceNotFoundException. Then we catch ResourceNotFoundException here and return a snapshot of a task as a response.

In the stack trace, getFinishedTaskFromIndex was called from getRunningTaskFromNode, not from waitedForCompletion due to a race between creating a get request and unblocking request which are sent asynchronously. I've changed the waitForCompletionTestCase test method to unblock the task only after the request started waiting for the task completion by registering a removal listener. By doing so, we make sure we test the "wait for completion" branch when task is running.

The part about the missed index seems to irrelevant, since waitedForCompletion is able to suppress the error and return a snapshot of running task which is not possible if getFinishedTaskFromIndex gets called directly from getRunningTaskFromNode.

Resolves #107823
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. low-risk An open issue or test failure that is a low risk to future releases Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants