Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] TransformIT testStopWaitForCheckpoint failing #79621

Closed
fcofdez opened this issue Oct 21, 2021 · 7 comments · Fixed by #88696
Closed

[CI] TransformIT testStopWaitForCheckpoint failing #79621

fcofdez opened this issue Oct 21, 2021 · 7 comments · Fixed by #88696
Labels
:ml/Transform Transform Team:ML Meta label for the ML team >test-failure Triaged test failures from CI

Comments

@fcofdez
Copy link
Contributor

fcofdez commented Oct 21, 2021

Build scan:
https://gradle-enterprise.elastic.co/s/7maeknveofx2c/tests/:x-pack:plugin:transform:qa:multi-node-tests:javaRestTest/org.elasticsearch.xpack.transform.integration.TransformIT/testStopWaitForCheckpoint

Reproduction line:
./gradlew ':x-pack:plugin:transform:qa:multi-node-tests:javaRestTest' --tests "org.elasticsearch.xpack.transform.integration.TransformIT.testStopWaitForCheckpoint" -Dtests.seed=3D08A69740D0585E -Dtests.locale=sr-Latn-BA -Dtests.timezone=SystemV/AST4 -Druntime.java=16

Applicable branches:
master, 7.15

Reproduces locally?:
Didn't try

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.transform.integration.TransformIT&tests.test=testStopWaitForCheckpoint

Failure excerpt:

java.lang.AssertionError: expected:<1> but was:<0>

  at __randomizedtesting.SeedInfo.seed([3D08A69740D0585E:40DD8AFC8C44B268]:0)
  at org.junit.Assert.fail(Assert.java:88)
  at org.junit.Assert.failNotEquals(Assert.java:834)
  at org.junit.Assert.assertEquals(Assert.java:645)
  at org.junit.Assert.assertEquals(Assert.java:631)
  at org.elasticsearch.xpack.transform.integration.TransformIntegTestCase.lambda$waitUntilCheckpoint$0(TransformIntegTestCase.java:236)
  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1039)
  at org.elasticsearch.xpack.transform.integration.TransformIntegTestCase.waitUntilCheckpoint(TransformIntegTestCase.java:235)
  at org.elasticsearch.xpack.transform.integration.TransformIntegTestCase.waitUntilCheckpoint(TransformIntegTestCase.java:231)
  at org.elasticsearch.xpack.transform.integration.TransformIT.testStopWaitForCheckpoint(TransformIT.java:284)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:567)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
  at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
  at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
  at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
  at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
  at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
  at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
  at java.lang.Thread.run(Thread.java:831)

@fcofdez fcofdez added Team:ML Meta label for the ML team :ml/Transform Transform >test-failure Triaged test failures from CI labels Oct 21, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@hendrikmuhs
Copy link

The 2 issues on 7.15 are basically timeouts, it fails for an assertAsync which waits 30s for some condition.

-> Looks like a performance issue

The master issue is different, it reports a wrong doc count.

@fcofdez
Copy link
Contributor Author

fcofdez commented Dec 7, 2021

There's been more timeouts in 7.15

@DaveCTurner
Copy link
Contributor

@hendrikmuhs
Copy link

The failing test is a system integration test, it can fail for many reasons. Logs must be inspected for every failure.

The relevant lines for the last one seem to be:

[2022-07-19T09:02:21,170][DEBUG][o.e.x.t.t.TransformTask  ] [javaRestTest-1] [transform-wait-for-checkpoint] start called with state [{"task_state":"stopped","indexer_state":"stopped","checkpoint":0,"progress":{"docs_indexed":0,"docs_processed":0},"should_stop_at_checkpoint":false}].
[2022-07-19T09:02:21,170][INFO ][o.e.x.t.t.TransformTask  ] [javaRestTest-1] [transform-wait-for-checkpoint] updating state for transform to [{"task_state":"started","indexer_state":"stopped","checkpoint":0,"progress":{"docs_indexed":0,"docs_processed":0},"should_stop_at_checkpoint":false}].
[2022-07-19T09:02:21,308][DEBUG][o.e.x.t.t.TransformTask  ] [javaRestTest-1] [transform-wait-for-checkpoint] successfully updated state for transform to [{"task_state":"started","indexer_state":"stopped","checkpoint":0,"progress":{"docs_indexed":0,"docs_processed":0},"should_stop_at_checkpoint":false}].
[2022-07-19T09:02:21,316][DEBUG][o.e.x.t.t.TransformTask  ] [javaRestTest-1] [transform-wait-for-checkpoint] attempted to set task to stop at checkpoint [true] with state [{"task_state":"started","indexer_state":"started","checkpoint":0,"progress":{"docs_indexed":0,"docs_processed":0},"should_stop_at_checkpoint":false}]
[2022-07-19T09:02:21,316][DEBUG][o.e.x.t.t.TransformIndexer] [javaRestTest-1] [transform-wait-for-checkpoint] transform is at a checkpoint, initiating stop.
[2022-07-19T09:02:21,317][DEBUG][o.e.x.t.t.TransformIndexer] [javaRestTest-1] [transform-wait-for-checkpoint] persisting stop at checkpoint
[2022-07-19T09:02:21,377][DEBUG][o.e.x.t.t.ClientTransformIndexer] [javaRestTest-1] [transform-wait-for-checkpoint] Updated state document from [null] to [{seqNo=1,primaryTerm=1,index=.transform-internal-007}]
[2022-07-19T09:02:21,387][DEBUG][o.e.x.t.t.TransformTask  ] [javaRestTest-1] [transform-wait-for-checkpoint] shutdown of transform requested
[2022-07-19T09:02:21,388][DEBUG][o.e.x.t.t.TransformTask  ] [javaRestTest-1] [transform-wait-for-checkpoint] stop called with force [false], shouldStopAtCheckpoint [true], state [{"task_state":"started","indexer_state":"started","checkpoint":0,"progress":{"docs_indexed":0,"docs_processed":0},"should_stop_at_checkpoint":false}], indexerstate[STARTED]
[2022-07-19T09:02:21,389][DEBUG][o.e.x.t.t.s.TransformScheduledTaskQueue] [javaRestTest-1] remove(transform-wait-for-checkpoint) is a no-op as the task for this transform does not exist
[2022-07-19T09:02:21,397][INFO ][o.e.x.t.t.TransformIndexer] [javaRestTest-1] [transform-wait-for-checkpoint] transform has stopped.
[2022-07-19T09:02:21,398][DEBUG][o.e.x.t.t.TransformIndexer] [javaRestTest-1] [transform-wait-for-checkpoint] updating persistent state of transform to [{"task_state":"stopped","indexer_state":"stopped","checkpoint":0,"progress":{"docs_indexed":0,"docs_processed":0},"should_stop_at_checkpoint":false}].
[2022-07-19T09:02:21,402][DEBUG][o.e.x.t.t.TransformTask  ] [javaRestTest-1] [transform-wait-for-checkpoint] indexer for transform has state [STOPPED]. Ignoring trigger.
[2022-07-19T09:02:21,403][DEBUG][o.e.x.t.t.s.TransformScheduledTaskQueue] [javaRestTest-1] remove(transform-wait-for-checkpoint) is a no-op as the task for this transform does not exist
[2022-07-19T09:02:21,403][INFO ][o.e.x.t.t.TransformPersistentTasksExecutor] [javaRestTest-1] [transform-wait-for-checkpoint] successfully completed and scheduled task in node operation
[2022-07-19T09:02:21,436][DEBUG][o.e.x.t.t.ClientTransformIndexer] [javaRestTest-1] [transform-wait-for-checkpoint] Updated state document from [{seqNo=1,primaryTerm=1,index=.transform-internal-007}] to [{seqNo=2,primaryTerm=1,index=.transform-internal-007}]
[2022-07-19T09:02:21,437][DEBUG][o.e.x.t.t.TransformTask  ] [javaRestTest-1] [transform-wait-for-checkpoint] shutdown of transform requested
[2022-07-19T09:02:21,437][DEBUG][o.e.x.t.t.s.TransformScheduledTaskQueue] [javaRestTest-1] remove(transform-wait-for-checkpoint) is a no-op as the task for this transform does not exist
[2022-07-19T09:02:21,437][WARN ][o.e.p.AllocatedPersistentTask] [javaRestTest-1] attempt to complete task [data_frame/transforms[c]] with id [transform-wait-for-checkpoint] in the [COMPLETED] state
[2022-07-19T09:02:21,523][DEBUG][o.e.x.t.c.DefaultCheckpointProvider] [javaRestTest-1] [transform-wait-for-checkpoint] Successfully retrieved checkpoints from cluster [] using transform checkpoint API
[2022-07-19T09:02:21,557][DEBUG][o.e.x.t.c.DefaultCheckpointProvider] [javaRestTest-1] [transform-wait-for-checkpoint] Successfully retrieved checkpoints from cluster [] using transform checkpoint API
[2022-07-19T09:02:21,615][DEBUG][o.e.x.t.c.DefaultCheckpointProvider] [javaRestTest-1] [transform-wait-for-checkpoint] Successfully retrieved checkpoints from cluster [] using transform checkpoint API
[2022-07-19T09:02:21,666][DEBUG][o.e.x.t.c.DefaultCheckpointProvider] [javaRestTest-1] [transform-wait-for-checkpoint] Successfully retrieved checkpoints from cluster [] using transform checkpoint API
[2022-07-19T09:02:21,986][DEBUG][o.e.x.t.c.DefaultCheckpointProvider] [javaRestTest-1] [transform-wait-for-checkpoint] Successfully retrieved checkpoints from cluster [] using transform checkpoint API
[2022-07-19T09:02:22,265][DEBUG][o.e.x.t.c.DefaultCheckpointProvider] [javaRestTest-1] [transform-wait-for-checkpoint] Successfully retrieved checkpoints from cluster [] using transform checkpoint API
[2022-07-19T09:02:22,809][DEBUG][o.e.x.t.c.DefaultCheckpointProvider] [javaRestTest-1] [transform-wait-for-checkpoint] Successfully retrieved checkpoints from cluster [] using transform checkpoint API
[2022-07-19T09:02:23,859][DEBUG][o.e.x.t.c.DefaultCheckpointProvider] [javaRestTest-1] [transform-wait-for-checkpoint] Successfully retrieved checkpoints from cluster [] using transform checkpoint API
[2022-07-19T09:02:38,272][DEBUG][o.e.x.t.c.DefaultCheckpointProvider] [javaRestTest-1] [transform-wait-for-checkpoint] Successfully retrieved checkpoints from cluster [] using transform checkpoint API
[2022-07-19T09:02:54,676][DEBUG][o.e.x.t.c.DefaultCheckpointProvider] [javaRestTest-1] [transform-wait-for-checkpoint] Successfully retrieved checkpoints from cluster [] using transform checkpoint API

It looks like the transform gets started and already the next line in the test stops with wait_for_checkpoint=true. This happens too fast, the transform hasn't been able to process anything. It stops at checkpoint 0, but the expectation of the test is to stop at checkpoint 1.

@hendrikmuhs
Copy link

We've discussed the issue: wait_for_checkpoint works as designed, it stops as early as possible. I will fix the test. In future we might add a new parameter where you can specify exactly when a transform should stop.

hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Jul 21, 2022
hendrikmuhs pushed a commit that referenced this issue Jul 22, 2022
)

ensure transform has done something before stop gets called with wait_for_checkpoint=true

fixes #79621
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Jul 22, 2022
…stic#88696)

ensure transform has done something before stop gets called with wait_for_checkpoint=true

fixes elastic#79621
elasticsearchmachine pushed a commit that referenced this issue Jul 22, 2022
…th (#88696) (#88704)

* ensure transform has done something before stop gets called with (#88696)

ensure transform has done something before stop gets called with wait_for_checkpoint=true

fixes #79621

* Update TransformIT.java

* Update TransformIT.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml/Transform Transform Team:ML Meta label for the ML team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants