Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] XPackRestIT test {p0=ml/forecast/Test forecast unknown job} failing #116150

Closed
elasticsearchmachine opened this issue Nov 3, 2024 · 10 comments
Assignees
Labels
low-risk An open issue or test failure that is a low risk to future releases :ml Machine learning Team:ML Meta label for the ML team >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

Build Scans:

Reproduction Line:

./gradlew ":x-pack:plugin:yamlRestTest" --tests "org.elasticsearch.xpack.test.rest.XPackRestIT.test {p0=ml/forecast/Test forecast unknown job}" -Dtests.seed=F1BDE02137EABDC7 -Dtests.locale=smn -Dtests.timezone=Europe/Uzhgorod -Druntime.java=23

Applicable branches:
main

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

org.junit.TestCouldNotBeSkippedException: Test could not be skipped due to other failures

Issue Reasons:

  • [main] 2 consecutive failures in test test {p0=ml/forecast/Test forecast unknown job}
  • [main] 2 failures in test test {p0=ml/forecast/Test forecast unknown job} (100.0% fail rate in 2 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine elasticsearchmachine added :Search Relevance/Ranking Scoring, rescoring, rank evaluation. >test-failure Triaged test failures from CI labels Nov 3, 2024
elasticsearchmachine added a commit that referenced this issue Nov 3, 2024
@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch main

Mute Reasons:

  • [main] 2 consecutive failures in test test {p0=ml/forecast/Test forecast unknown job}
  • [main] 2 failures in test test {p0=ml/forecast/Test forecast unknown job} (100.0% fail rate in 2 executions)

Build Scans:

@elasticsearchmachine elasticsearchmachine added needs:risk Requires assignment of a risk label (low, medium, blocker) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Nov 3, 2024
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@davidkyle davidkyle added :ml Machine learning and removed :Search Relevance/Ranking Scoring, rescoring, rank evaluation. labels Nov 4, 2024
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Nov 4, 2024
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine elasticsearchmachine removed the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Nov 4, 2024
@davidkyle
Copy link
Member

The failure is due to an assertion in the logs taking down the node

[2024-11-01T02:46:22,021][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [yamlRestTest-0] fatal error in thread [elasticsearch[yamlRestTest-0][system_critical_write][T#3]], exiting
java.lang.AssertionError: null
	at org.elasticsearch.index.mapper.IgnoredSourceFieldMapper.postParse(IgnoredSourceFieldMapper.java:161) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:190) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:136) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:113) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:1043) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:984) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:928) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:378) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:237) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:305) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:153) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:80) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:220) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:34) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.lang.Thread.run(Thread.java:1575) ~[?:?]

yamlRestTest.log

@davidkyle davidkyle added :StorageEngine/Logs You know, for Logs and removed :ml Machine learning Team:ML Meta label for the ML team labels Nov 4, 2024
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@kkrik-es kkrik-es added :ml Machine learning Team:ML Meta label for the ML team Team:StorageEngine :StorageEngine/Mapping The storage related side of mappings and removed Team:StorageEngine :StorageEngine/Logs You know, for Logs :ml Machine learning Team:ML Meta label for the ML team labels Nov 4, 2024
@kkrik-es kkrik-es self-assigned this Nov 4, 2024
@kkrik-es
Copy link
Contributor

kkrik-es commented Nov 4, 2024

@davidkyle thanks for looking. The stack trace above seems like an issue with synthetic source indeed.

I just synced and can't reproduce the issue. Did you just use the command above, in main? If it doesn't reproduce any more, I'm tempted to unmute and see if it'll come back.

@kkrik-es
Copy link
Contributor

kkrik-es commented Nov 4, 2024

Btw the first failure link above points to a different error:

REPRODUCE WITH: ./gradlew ":x-pack:plugin:yamlRestTest" --tests "org.elasticsearch.xpack.test.rest.XPackRestIT.test {p0=ml/forecast/Test forecast unknown job}" -Dtests.seed=F1BDE02137EABDC7 -Dtests.locale=smn -Dtests.timezone=Europe/Uzhgorod -Druntime.java=23

XPackRestIT > test {p0=ml/forecast/Test forecast unknown job} FAILED
    org.junit.TestCouldNotBeSkippedException: Test could not be skipped due to other failures
        at org.junit.runners.model.MultipleFailureException.<init>(MultipleFailureException.java:36)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1014)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
        at org.elasticsearch.test.cluster.local.DefaultLocalElasticsearchCluster$1.evaluate(DefaultLocalElasticsearchCluster.java:48)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
        at java.base/java.lang.Thread.run(Thread.java:1575)

        Caused by:
        org.junit.AssumptionViolatedException: [ml/forecast/Test forecast unknown job] skipped, reason: [https://github.com/elastic/elasticsearch/issues/34747]

The second link is from #116049 that has changes for synthetic source, so it's irrelevant.

jfreden pushed a commit to jfreden/elasticsearch that referenced this issue Nov 4, 2024
@kkrik-es
Copy link
Contributor

kkrik-es commented Nov 5, 2024

Assigning back to @davidkyle since this doesn't seem like an issue with synthetic source. Please assign back to me if a failure outside a PR or another repro pointing to a parsing exception.

@kkrik-es kkrik-es removed their assignment Nov 5, 2024
@kkrik-es kkrik-es added :ml Machine learning Team:ML Meta label for the ML team and removed Team:StorageEngine :StorageEngine/Mapping The storage related side of mappings labels Nov 5, 2024
@davidkyle
Copy link
Member

Thanks for the investigation @kkrik-es

@davidkyle
Copy link
Member

The failing test is actually muted, the TestCouldNotBeSkippedException means that the error occurred either in the test setup or teardown. In this case it is a search_phase_execution_exception in the teardown.

    org.elasticsearch.client.ResponseException: method [GET], host [http://[::1]:35139], URI [/_ml/trained_models/_stats?size=10000], status line [HTTP/1.1 500 Internal Server Error]	
    {"error":{"root_cause":[],"type":"exception","reason":"Searching for stats for models [lang_ident_model_1] failed","caused_by":{"type":"search_phase_execution_exception","reason":"","phase":"query","grouped":true,"failed_shards":[],"caused_by":{"type":"search_phase_execution_exception","reason":"Search rejected due to missing shards [[.ml-stats-000001][0]]. Consider using `allow_partial_search_results` setting to bypass this error.","phase":"query","grouped":true,"failed_shards":[]}}},"status":500}	
        at app//org.elasticsearch.client.RestClient.convertResponse(RestClient.java:351)	
        at app//org.elasticsearch.client.RestClient.performRequest(RestClient.java:317)	
        at app//org.elasticsearch.client.RestClient.performRequest(RestClient.java:292)	
        at app//org.elasticsearch.xpack.core.ml.integration.MlRestTestStateCleaner.deleteAllTrainedModelIngestPipelines(MlRestTestStateCleaner.java:43)	
        at app//org.elasticsearch.xpack.core.ml.integration.MlRestTestStateCleaner.resetFeatures(MlRestTestStateCleaner.java:34)	
        at app//org.elasticsearch.xpack.test.rest.AbstractXPackRestTest.clearMlState(AbstractXPackRestTest.java:138)	
        at app//org.elasticsearch.xpack.test.rest.AbstractXPackRestTest.cleanup(AbstractXPackRestTest.java:118)

@davidkyle davidkyle added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low-risk An open issue or test failure that is a low risk to future releases :ml Machine learning Team:ML Meta label for the ML team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

3 participants