Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

behavioral_analytics-events-final_pipeline parsing fails in a mixed cluster #95766

Closed
davidkyle opened this issue May 3, 2023 · 10 comments · Fixed by #95780 or #96497
Closed

behavioral_analytics-events-final_pipeline parsing fails in a mixed cluster #95766

davidkyle opened this issue May 3, 2023 · 10 comments · Fixed by #95780 or #96497
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP :EnterpriseSearch/Application Enterprise Search Team:Data Management Meta label for data/management team Team:Enterprise Search Meta label for Enterprise Search team

Comments

@davidkyle
Copy link
Member

davidkyle commented May 3, 2023

Elasticsearch Version

main

Installed Plugins

No response

Java Version

bundled

OS Version

macOS

Problem Description

The pipeline defined in behavioral_analytics-events-final_pipeline.json uses the new ignore_missing setting for the uri_parts ingest processor added in #95068.

The pipeline is referenced in the index template behavioral_analytics-events-settings.json.

"final_pipeline": "behavioral_analytics-events-final_pipeline",
...

That template may be PUT as a node is upgraded during a rolling upgrade, in that mixed cluster scenario only the upgraded nodes recognise the new ignore_missing setting.

Steps to Reproduce

The problem appears in the rolling upgrade test failure tracked in #95360

./gradlew ':x-pack:qa:rolling-upgrade:v8.2.0#twoThirdsUpgradedTest' -Dtests.class="org.elasticsearch.upgrades.MlTrainedModelsUpgradeIT" -Dtests.method="testTrainedModelInference" -Dtests.seed=D0F64DC8162BB439 -Dtests.bwc=true -Dtests.locale=be-BY -Dtests.timezone=Africa/Lubumbashi -Druntime.java=20

Logs (if relevant)

[2023-05-02T19:25:31,393][WARN ][o.e.i.IngestService      ] [v8.0.1-2] failed to update ingest pipelines
org.elasticsearch.ElasticsearchParseException: processor [uri_parts] doesn't support one or more provided configuration parameters [ignore_missing]
	at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:588) ~[elasticsearch-8.0.1.jar:8.0.1]
	at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:547) ~[elasticsearch-8.0.1.jar:8.0.1]
	at org.elasticsearch.ingest.ConfigurationUtils.readProcessorConfigs(ConfigurationUtils.java:467) ~[elasticsearch-8.0.1.jar:8.0.1]
	at org.elasticsearch.ingest.Pipeline.create(Pipeline.java:82) ~[elasticsearch-8.0.1.jar:8.0.1]
	at org.elasticsearch.ingest.IngestService.innerUpdatePipelines(IngestService.java:922) ~[elasticsearch-8.0.1.jar:8.0.1]
	at org.elasticsearch.ingest.IngestService.applyClusterState(IngestService.java:898) [elasticsearch-8.0.1.jar:8.0.1]
	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:544) [elasticsearch-8.0.1.jar:8.0.1]
	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:531) [elasticsearch-8.0.1.jar:8.0.1]
	at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:503) [elasticsearch-8.0.1.jar:8.0.1]
	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:428) [elasticsearch-8.0.1.jar:8.0.1]
	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:154) [elasticsearch-8.0.1.jar:8.0.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:717) [elasticsearch-8.0.1.jar:8.0.1]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:260) [elasticsearch-8.0.1.jar:8.0.1]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:223) [elasticsearch-8.0.1.jar:8.0.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
	Suppressed: org.elasticsearch.ElasticsearchParseException: processor [uri_parts] doesn't support one or more provided configuration parameters [ignore_missing]
		at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:588) ~[elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:547) ~[elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.ingest.ConfigurationUtils.readProcessorConfigs(ConfigurationUtils.java:467) ~[elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.ingest.Pipeline.create(Pipeline.java:82) ~[elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.ingest.IngestService.innerUpdatePipelines(IngestService.java:922) ~[elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.ingest.IngestService.applyClusterState(IngestService.java:898) [elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:544) [elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:531) [elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:503) [elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:428) [elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:154) [elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:717) [elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:260) [elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:223) [elasticsearch-8.0.1.jar:8.0.1]
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
		at java.lang.Thread.run(Thread.java:833) [?:?]
	Suppressed: org.elasticsearch.ElasticsearchParseException: processor [uri_parts] doesn't support one or more provided configuration parameters [ignore_missing]
		at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:588) ~[elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.ingest.common.ForEachProcessor$Factory.create(ForEachProcessor.java:187) ~[?:?]
		at org.elasticsearch.ingest.common.ForEachProcessor$Factory.create(ForEachProcessor.java:168) ~[?:?]
		at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:583) ~[elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:547) ~[elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.ingest.ConfigurationUtils.readProcessorConfigs(ConfigurationUtils.java:467) ~[elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.ingest.Pipeline.create(Pipeline.java:82) ~[elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.ingest.IngestService.innerUpdatePipelines(IngestService.java:922) ~[elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.ingest.IngestService.applyClusterState(IngestService.java:898) [elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:544) [elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:531) [elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:503) [elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:428) [elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:154) [elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:717) [elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:260) [elasticsearch-8.0.1.jar:8.0.1]
		at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:223) [elasticsearch-8.0.1.jar:8.0.1]
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
		at java.lang.Thread.run(Thread.java:833) [?:?]
@davidkyle davidkyle added >bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP :EnterpriseSearch/Application Enterprise Search labels May 3, 2023
@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team Team:Enterprise Search Meta label for Enterprise Search team labels May 3, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/enterprise-search (Team:Enterprise Search)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@droberts195
Copy link
Contributor

That template may be PUT as a node is upgraded during a rolling upgrade, in that mixed cluster scenario only the upgraded nodes recognise the new ignore_missing setting.

My reading of this sentence indicates that this is not just a quirk of the test framework but is a real production problem that could affect production customers during rolling upgrades if we release like this.

I haven't spent a long time investigating this and I am not sure how bad the effect would be during the mixed version state, but just wanted to note that this issue deserves a proper investigation before 8.8.0 is released. For example, if the cluster was ingesting data during the rolling upgrade would that ingest break during the period between the new ingest pipeline being installed and the ingest nodes getting upgraded to the newer version?

An approach the ML team has used for rolling upgrade situations like this is to not start using new functionality until the entire cluster has been upgraded to the newer version.

@davidkyle
Copy link
Member Author

I do not know if this breaks ingest pipelines on the older nodes (possibly) but one consequence is that the final pipeline will not be available on older nodes. The expected behaviour that documents will be processed through the final pipeline does not necessarily hold and now there is uncertainty about what the index contains.

I agree in this case it is best to wait for the cluster to be fully upgraded.

@jimczi
Copy link
Contributor

jimczi commented May 3, 2023

Great catch, thanks @davidkyle!
@afoucret @joemcelroy can you take a look? I don't think this breaks other ingest pipelines on the older nodes since the exception is caught and re-thrown at the end but it's worth checking.

I agree in this case it is best to wait for the cluster to be fully upgraded.

+1, we should never try to update/create these pipelines in a mixed cluster imo.

@afoucret
Copy link
Contributor

afoucret commented May 3, 2023

Hey @davidkyle, Thank you for the heads up.
#95780 provides a fix for it. We now avoid installing pipelines before all the nodes in the cluster are satisfying a minimum condition version.

davidkyle added a commit that referenced this issue May 3, 2023
…fig (#95778)

In #95766 the ML trained model deployment upgrade tests fail 
due to an invalid ingest processor configuration. ML stopped
parsing the full ingest pipeline in version 8.3.1 so the tests can
be re-enable when upgrading from 8.3.1 or later.
davidkyle added a commit to davidkyle/elasticsearch that referenced this issue May 3, 2023
…fig (elastic#95778)

In elastic#95766 the ML trained model deployment upgrade tests fail 
due to an invalid ingest processor configuration. ML stopped
parsing the full ingest pipeline in version 8.3.1 so the tests can
be re-enable when upgrading from 8.3.1 or later.
elasticsearchmachine pushed a commit that referenced this issue May 3, 2023
…fig (#95778) (#95783)

In #95766 the ML trained model deployment upgrade tests fail 
due to an invalid ingest processor configuration. ML stopped
parsing the full ingest pipeline in version 8.3.1 so the tests can
be re-enable when upgrading from 8.3.1 or later.
@afoucret
Copy link
Contributor

afoucret commented May 4, 2023

Hey @davidkyle, I have just merged #95780 which is fixing this issue.

@davidkyle
Copy link
Member Author

@afoucret I re-enabled the ml tests that were failing (#96193) and I'm seeing the same processor [uri_parts] doesn't support one or more provided configuration parameters [ignore_missing] failure when the cluster is 2/3rds upgraded.

I don't see the fix you linked to in the main branch anymore. Is there an alternative mechanism that should prevent this kind of error?

Here's the build failure : https://gradle-enterprise.elastic.co/s/ch256gh4tunnc/tests/:x-pack:qa:rolling-upgrade:v8.2.1%23twoThirdsUpgradedTest/org.elasticsearch.upgrades.MLModelDeploymentsUpgradeIT/testTrainedModelDeployment?top-execution=1

@davidkyle davidkyle reopened this May 17, 2023
@jimczi
Copy link
Contributor

jimczi commented May 17, 2023

Oups sorry @davidkyle, we had some back and forth with this one. The current implementation in main is in a bad state because the ingest pipeline management has been moved to core Elasticsearch and the logic to check the version was removed. Thanks for reopening the issue, we'll work on a fix and update/close this issue when it's done (hopefully very soon).

@davidkyle
Copy link
Member Author

Thanks @jimczi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP :EnterpriseSearch/Application Enterprise Search Team:Data Management Meta label for data/management team Team:Enterprise Search Meta label for Enterprise Search team
Projects
None yet
5 participants