Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] DocsClientYamlTestSuiteIT test {yaml=reference/rest-api/watcher/put-watch/line_120} failing #99517

Closed
DaveCTurner opened this issue Sep 13, 2023 · 11 comments · Fixed by #106144
Assignees
Labels
:Data Management/Watcher low-risk An open issue or test failure that is a low risk to future releases Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI

Comments

@DaveCTurner
Copy link
Contributor

Build scan:
https://gradle-enterprise.elastic.co/s/rfrmj66wpiq3g/tests/:docs:yamlRestTest/org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT/test%20%7Byaml=reference%2Frest-api%2Fwatcher%2Fput-watch%2Fline_120%7D

Reproduction line:

./gradlew ':docs:yamlRestTest' --tests "org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT.test {yaml=reference/rest-api/watcher/put-watch/line_120}" -Dtests.seed=7E10037999221C8B -Dtests.locale=ms -Dtests.timezone=Etc/GMT-13 -Druntime.java=20

Applicable branches:
main

Reproduces locally?:
Didn't try

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT&tests.test=test%20%7Byaml%3Dreference/rest-api/watcher/put-watch/line_120%7D

Failure excerpt:

org.elasticsearch.client.ResponseException: method [DELETE], host [http://127.0.0.1:44133], URI [*,-.ds-ilm-history-*?expand_wildcards=open%2Cclosed%2Chidden], status line [HTTP/1.1 400 Bad Request]
Warnings: [this request accesses system indices: [.triggered_watches], but in a future major version, direct access to system indices will be prevented by default]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"index [.ds-.watcher-history-16-2023.09.13-000001] is the write index for data stream [.watcher-history-16] and cannot be deleted"}],"type":"illegal_argument_exception","reason":"index [.ds-.watcher-history-16-2023.09.13-000001] is the write index for data stream [.watcher-history-16] and cannot be deleted"},"status":400}

  at __randomizedtesting.SeedInfo.seed([7E10037999221C8B:F6443CA337DE7173]:0)
  at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:347)
  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:313)
  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:288)
  at org.elasticsearch.test.rest.ESRestTestCase.wipeAllIndices(ESRestTestCase.java:972)
  at org.elasticsearch.test.rest.ESRestTestCase.wipeCluster(ESRestTestCase.java:720)
  at org.elasticsearch.test.rest.ESRestTestCase.cleanUpCluster(ESRestTestCase.java:418)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
  at java.lang.reflect.Method.invoke(Method.java:578)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1623)

@DaveCTurner DaveCTurner added :Data Management/Watcher >test-failure Triaged test failures from CI labels Sep 13, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Sep 13, 2023
@masseyke
Copy link
Member

Suspiciously, this test began at 11:59,794 and ended at 12:00,510.

@masseyke
Copy link
Member

This happens during the cleanup at the end of the test. It deletes all data streams, then deletes all indices. My guess is that since this ran across midnight, the watch was actually triggered. Since history is now written asynchronously, a history record came in after the watcher history datastream had been deleted, causing the datastream to be recreated.
I'm not sure how we can fix this. We can close the bulk processor that's writing watcher history entries, but there's no rest API for that.

@piergm
Copy link
Member

piergm commented Oct 4, 2023

This issue presented itself again today:
https://gradle-enterprise.elastic.co/s/7dzvrow5wiwvc

@dakrone dakrone added the low-risk An open issue or test failure that is a low risk to future releases label Oct 12, 2023
@jfreden
Copy link
Contributor

jfreden commented Jan 22, 2024

This happened again today: https://gradle-enterprise.elastic.co/s/pjox7ahol3brw

@gmarouli
Copy link
Contributor

gmarouli commented Mar 8, 2024

Since history is now written asynchronously, a history record came in after the watcher history datastream had been deleted, causing the datastream to be recreated.
I'm not sure how we can fix this.

@masseyke Since this is a clean up what if we first disable watcher and then perform the clean up? Do you think this would work?

Also the midnight part might not be so relevant because we have failures at different times as well.

@gmarouli
Copy link
Contributor

gmarouli commented Mar 8, 2024

@masseyke Since this is a clean up what if we first disable watcher and then perform the clean up? Do you think this would work?

Nvm, it was static setting

@thecoop thecoop reopened this Jul 11, 2024
@gmarouli
Copy link
Contributor

This has reoccurred on main: https://gradle-enterprise.elastic.co/s/7codb74wbx2vi/tests/task/:docs:yamlRestTest/details/org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT/test%20%7Byaml=reference%2Frest-api%2Fwatcher%2Fput-watch%2Fline_120%7D?top-execution=1

This is not the same issue. In the past the delete failed while now it fails because it returns a warning:

org.elasticsearch.client.WarningFailureException: method [DELETE], host [http://127.0.0.1:43461], URI [*,-.ds-ilm-history-*,-.ds-.slm-history-*,-.ds-.watcher-history-*?expand_wildcards=open%2Cclosed%2Chidden], status line [HTTP/1.1 200 OK] |  
-- | --
  | Warnings: [this request accesses system indices: [.triggered_watches], but in a future major version, direct access to system indices will be prevented by default] |  
  | {"acknowledged":true}

I will check if we can absorb the warning though

@elasticsearchmachine
Copy link
Collaborator

This has been muted on branch main

Mute Reasons:

  • [main] 2 failures in test test {yaml=reference/rest-api/watcher/put-watch/line_120} (0.3% fail rate in 719 executions)

Build Scans:

elasticsearchmachine added a commit that referenced this issue Aug 29, 2024
…=reference/rest-api/watcher/put-watch/line_120} #99517
dakrone pushed a commit to dakrone/elasticsearch that referenced this issue Aug 30, 2024
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Sep 4, 2024
@elasticsearchmachine elasticsearchmachine closed this as not planned Won't fix, can't repro, duplicate, stale Nov 5, 2024
@elasticsearchmachine
Copy link
Collaborator

This issue has been closed because it has been open for too long with no activity.

Any muted tests that were associated with this issue have been unmuted.

If the tests begin failing again, a new issue will be opened, and they may be muted again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Watcher low-risk An open issue or test failure that is a low risk to future releases Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI
Projects
None yet
8 participants