Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] MlTrainedModelsUpgradeIT testTrainedModelInference and MLModelDeploymentsUpgradeIT failing #95360

Closed
droberts195 opened this issue Apr 19, 2023 · 5 comments · Fixed by #95778
Labels
:ml Machine learning Team:ML Meta label for the ML team >test-failure Triaged test failures from CI

Comments

@droberts195
Copy link
Contributor

Build scan:
https://gradle-enterprise.elastic.co/s/txo4anowmywnm/tests/:x-pack:qa:rolling-upgrade:v8.2.0%23twoThirdsUpgradedTest/org.elasticsearch.upgrades.MlTrainedModelsUpgradeIT/testTrainedModelInference

Reproduction line:

./gradlew ':x-pack:qa:rolling-upgrade:v8.2.0#twoThirdsUpgradedTest' -Dtests.class="org.elasticsearch.upgrades.MlTrainedModelsUpgradeIT" -Dtests.method="testTrainedModelInference" -Dtests.seed=D0F64DC8162BB439 -Dtests.bwc=true -Dtests.locale=be-BY -Dtests.timezone=Africa/Lubumbashi -Druntime.java=20

Applicable branches:
main

Reproduces locally?:
Didn't try

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.upgrades.MlTrainedModelsUpgradeIT&tests.test=testTrainedModelInference

Failure excerpt:

org.elasticsearch.client.ResponseException: method [GET], host [http://[::1]:34607], URI [_ml/trained_models/_all/_stats], status line [HTTP/1.1 500 Internal Server Error]
{"error":{"root_cause":[{"type":"parse_exception","reason":"processor [uri_parts] doesn't support one or more provided configuration parameters [ignore_missing]","processor_type":"uri_parts","suppressed":[{"type":"parse_exception","reason":"processor [uri_parts] doesn't support one or more provided configuration parameters [ignore_missing]","processor_type":"uri_parts"},{"type":"parse_exception","reason":"processor [uri_parts] doesn't support one or more provided configuration parameters [ignore_missing]","processor_type":"foreach"}]}],"type":"exception","reason":"unexpected failure gathering pipeline information","caused_by":{"type":"parse_exception","reason":"processor [uri_parts] doesn't support one or more provided configuration parameters [ignore_missing]","processor_type":"uri_parts","suppressed":[{"type":"parse_exception","reason":"processor [uri_parts] doesn't support one or more provided configuration parameters [ignore_missing]","processor_type":"uri_parts"},{"type":"parse_exception","reason":"processor [uri_parts] doesn't support one or more provided configuration parameters [ignore_missing]","processor_type":"foreach"}]}},"status":500}

  at __randomizedtesting.SeedInfo.seed([D0F64DC8162BB439:4F7A036EC7A2D7FA]:0)
  at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:347)
  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:313)
  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:288)
  at org.elasticsearch.upgrades.MlTrainedModelsUpgradeIT.getTrainedModelStats(MlTrainedModelsUpgradeIT.java:103)
  at org.elasticsearch.upgrades.MlTrainedModelsUpgradeIT.testTrainedModelInference(MlTrainedModelsUpgradeIT.java:78)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
  at java.lang.reflect.Method.invoke(Method.java:578)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1623)

@droberts195 droberts195 added :ml Machine learning >test-failure Triaged test failures from CI labels Apr 19, 2023
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Apr 19, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@davidkyle
Copy link
Member

Mute in #95363

@davidkyle davidkyle changed the title [CI] MlTrainedModelsUpgradeIT testTrainedModelInference failing [CI] MlTrainedModelsUpgradeIT testTrainedModelInference and MLModelDeploymentsUpgradeIT failing Apr 19, 2023
@davidkyle
Copy link
Member

@davidkyle
Copy link
Member

I opened #95766 to discuss the root cause.

The ml get trained model stats API stopped parsing the full ingest pipeline in v8.3.1 (#87978), upgrades tests after that version will pass. One way to re-enable these tests in the short term is only call get stats if the starting version is >= 8.3.1

@droberts195
Copy link
Contributor Author

One way to re-enable these tests in the short term is only call get stats if the starting version is >= 8.3.1.

Yes, I agree this is a good idea.

The underlying problem is nothing to do with ML, so it's bad that we have our upgrade tests disabled because of it.

It's better that we have some coverage for ML trained model upgrades, particularly between 8.7 and 8.8 where we've changed a few things.

Please add a comment to say why the test is skipping versions <= 8.3.0, and a TODO to remove that if the problematic index template is fixed. (I guess the proper fix would be for the new Enterprise Search functionality to only be installed into the cluster once all nodes have been upgraded to a version that understands the syntax. But that's outside our scope to decide.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning Team:ML Meta label for the ML team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants