[BUG] MLInferenceIngestProcessor has xContentRegistry as null #3276

brianf-aws · 2024-12-13T22:05:01Z

What is the bug?
Commit that introduces local models to MLInferenceIngestProcessor

On creation of the Ingest processor the MLInferenceIngestProcessor has no valid xContentRegistry which is used to create valid MLInput objects used for model prediction

Contrast this with the MLInferenceSearchResponseProcessor That has a valid contentRegistry, this processor is able to correctly make objects and can produce valid model input.

If you want to test this locally I reccomend you follow the steps to create a local embedding

Then run the following

PUT {{ _.domain }}/_ingest/pipeline/asymmetric_embedding_ingest_pipeline
{
	"description": "ingest passage text and generate a embedding using an asymmetric model",
	"processors": [
        {
	        "ml_inference": {
        
		        "model_input": "{\"text_docs\":[\"${input_map.text_docs}\"],\"target_response\":[\"sentence_embedding\"],\"parameters\":{\"content_type\":\"query\"}}",
		        "function_name": "text_embedding",
		        "model_id": "{{ _.model_id }}",
		        "input_map": [
			        {
				        "text_docs": "description"
			        }
		        ],
		        "output_map": [
			        {
				        "fact_embedding": "$.inference_results.*.output.*.data",
				        "embedding_size": "$.inference_results.*.output.*.shape[0]"
			        }
		        ]
	        }
        }
	]
}


### -------------------------- ack expected 
POST {{ _.domain }}/_ingest/pipeline/asymmetric_embedding_ingest_pipeline/_simulate
{
	"docs": [
          {
	          "_index": "my-index",
	          "_id": "1",
	          "_source": {
		          "title": "Central Park",
		          "description": "A large public park in the heart of New York City, offering a wide range of recreational activities.",
		          "content_type": "passage"
	          }
          }
	]
}

What is the expected behavior?
A proper embedding should appear as a result of the simulating the ingest pipeline.

What is your host/environment?

OS: 2.18
Version [e.g. 22]
Plugins

The text was updated successfully, but these errors were encountered:

brianf-aws · 2024-12-14T02:39:52Z

After investigating, I realized that this requires a open-search core code change along with a minor Ml-Commons change. More specifically, it requires that the IngestService class change its constructor to have a content registry object as a new parameter.
This will obviously cause a cascading amount of changes related to ingest service. However, this sort of code change was done previously as shown here which will pass indiceServices to ingest processors by modifying the node class.

ml-commons/plugin/src/main/java/org/opensearch/ml/plugin/MachineLearningPlugin.java

Line 1068 in 9a4166e

    
           new MLInferenceIngestProcessor.Factory(parameters.scriptService, parameters.client, xContentRegistry)

This is the most sensible solution as any new ingest processor can benefit from making objects using the xContentRegistry provided by the Node class. This also benefits the MlInferenceIngest processor to create custom ML Inputs based on any new model that comes up as shown by the Asymmetric Model Type .

Are there any easier ways to solve this issue?

Currently we can not circumvent this by using the createComponents method within MLPlugin

ml-commons/plugin/src/main/java/org/opensearch/ml/plugin/MachineLearningPlugin.java

Lines 458 to 470 in 9a4166e

    
           public Collection<Object> createComponents( 
        
               Client client, 
        
               ClusterService clusterService, 
        
               ThreadPool threadPool, 
        
               ResourceWatcherService resourceWatcherService, 
        
               ScriptService scriptService, 
        
               NamedXContentRegistry xContentRegistry, 
        
               Environment environment, 
        
               NodeEnvironment nodeEnvironment, 
        
               NamedWriteableRegistry namedWriteableRegistry, 
        
               IndexNameExpressionResolver indexNameExpressionResolver, 
        
               Supplier<RepositoriesService> repositoriesServiceSupplier 
        
           ) {

as the creation of the ingestProcessor gets called before the createComponents method

ml-commons/plugin/src/main/java/org/opensearch/ml/plugin/MachineLearningPlugin.java

Line 1063 in 9a4166e

    
           public Map<String, org.opensearch.ingest.Processor.Factory> getProcessors(org.opensearch.ingest.Processor.Parameters parameters) {

One way to easily remove the need for the code change is to change the order of creation with the node class. But this is a terrible idea as we will depend on the order of code (This is a huge headache for future contributors who decide to modify the relavent code discussed) to make sure everything is correct see below.

This is the order the code executes which explains why the xContentRegistry is null when creating the ingest processor

IngestService gets invoked (first )calling the factories of the processors this includes the MlInferenceIngestProcessor at which time in point the xContentRegistry is null
The plugins load their configurations (more specifically the MLPlugin), which service the xContentRegistry therefore the processor doesnt have the valid object in time.

austintlee · 2024-12-14T19:46:42Z

I think you should start with a PR in core and get feedback. Just to be clear, you want to introduce a new xContentRegistry member in org.opensearch.ingest.Processor.Parameters class, right? That's exactly what SearchPipeline is doing.

brianf-aws · 2024-12-16T18:03:54Z

Hey Austin! @austintlee, Yes thats part of it; this involves several code changes

IngestService has a xContentRegistry paramater
Give Ingest Processor Parameters a new xContentRegistry field, you might get mixed up since the Search pipeline has a same class name but it has the xContentRegistry which is why its able to work
Now within the MLPlugins class we do the following (what you suggested)

ml-commons/plugin/src/main/java/org/opensearch/ml/plugin/MachineLearningPlugin.java

Line 1068 in 9a4166e

new MLInferenceIngestProcessor.Factory(parameters.scriptService, parameters.client, xContentRegistry)

 new MLInferenceIngestProcessor.Factory(parameters.scriptService, parameters.client, paramters.xContentRegistry)

brianf-aws · 2024-12-16T18:40:30Z

Currently since the implementation of ingestProcessors have a null content registry we can force the xContentRegistry to have the registry as specified by the getNamedXContent() method

ml-commons/plugin/src/main/java/org/opensearch/ml/plugin/MachineLearningPlugin.java

Line 912 in 9a4166e

public List<NamedXContentRegistry.Entry> getNamedXContent() {

this is a temporary ML-Commons unblocker for using local models but I would need to have a OpenSearch core fix which would take awhile.

Here is the plan of action to make this happen. Within the specified order.

ML-Commons Temporary unblocker for parametrized local models to be used for MLInferenceIngest Processor : Adds preset contentRegistry for IngestProcessors #3281
OpenSearch Core change : [PR to be shared]
ML-Commons change with contentRegistry coming from core. : [PR to be shared]

brianf-aws added bug Something isn't working untriaged labels Dec 13, 2024

brianf-aws linked a pull request Dec 16, 2024 that will close this issue

Adds preset contentRegistry for IngestProcessors #3281

Open

5 tasks

brianf-aws mentioned this issue Dec 17, 2024

[META] IngestService needs xContentRegistry for parametized models. opensearch-project/OpenSearch#16865

Open

dhrubo-os added this to ml-commons projects Dec 17, 2024

dhrubo-os assigned brianf-aws Dec 17, 2024

dhrubo-os removed the untriaged label Dec 17, 2024

dhrubo-os moved this to In Progress in ml-commons projects Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] MLInferenceIngestProcessor has xContentRegistry as null #3276

[BUG] MLInferenceIngestProcessor has xContentRegistry as null #3276

brianf-aws commented Dec 13, 2024 •

edited

Loading

brianf-aws commented Dec 14, 2024 •

edited

Loading

austintlee commented Dec 14, 2024

brianf-aws commented Dec 16, 2024

brianf-aws commented Dec 16, 2024 •

edited

Loading

[BUG] MLInferenceIngestProcessor has xContentRegistry as null #3276

[BUG] MLInferenceIngestProcessor has xContentRegistry as null #3276

Comments

brianf-aws commented Dec 13, 2024 • edited Loading

brianf-aws commented Dec 14, 2024 • edited Loading

Are there any easier ways to solve this issue?

austintlee commented Dec 14, 2024

brianf-aws commented Dec 16, 2024

brianf-aws commented Dec 16, 2024 • edited Loading

brianf-aws commented Dec 13, 2024 •

edited

Loading

brianf-aws commented Dec 14, 2024 •

edited

Loading

brianf-aws commented Dec 16, 2024 •

edited

Loading