Adds preset contentRegistry for IngestProcessors #3281

brianf-aws · 2024-12-16T18:34:13Z

Background

Currently local models that use the parameters map within the payload to create a request can not create objects to be used for local model prediction, when using the MLInferenceIngestProcessor.
e.g.

## This is a cutom Asymmetric model prediction this passes
POST {{ _.domain }}/_plugins/_ml/_predict/text_embedding/{{ _.model_id }}
{
  "parameters": {
    "content_type": "query"
  },
  "text_docs": ["What day is it today?"],
  "target_response": ["sentence_embedding"]
}

## Ingest Pipeline configuration (Using the MLInferenceIngest Processor) that fails to create the MLInput needed for model prediciton

PUT {{ _.domain }}/_ingest/pipeline/asymmetric_embedding_ingest_pipeline
{
	"description": "ingest passage text and generate a embedding using an asymmetric model",
	"processors": [
		{
			"ml_inference": {

				"model_input": "{\"text_docs\":[\"${input_map.text_docs}\"],\"target_response\":[\"sentence_embedding\"],\"parameters\":{\"content_type\":\"query\"}}",
				"function_name": "text_embedding",
				"model_id": "{{ _.model_id }}",
				"input_map": [
					{
						"text_docs": "description"
					}
				],
				"output_map": [
					{
						"fact_embedding": "$.inference_results.*.output.*.data",
						"embedding_size": "$.inference_results.*.output.*.shape[0]"
					}
				]
			}
		}
	]
}

This requires a opensearch core change because it needs the contentRegistry,however given there is not much dependency on the registry (currently) we can give it the preset registry given in the MachineLearningPlugin class via the getNamedXContent() class to temporarily unblock this use case while a OpenSearch Core fix gives a proper change.

Context

Please see this issue #3276.

Related Issues

Temporarily resolves #3276 while waiting for a OpenSearch core fix.

Check List

New functionality includes manual testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…d local models Curently local models that use the parameters map within the payload to create a request can not create objects to be used for local model prediction. This requires a opensearch core change because it needs the contentRegistry,however given there is not much dependency on the registry (currently) we can give it the preset registry given in the MachineLearningPlugin class vai the getNamedXContent() class Signed-off-by: Brian Flores <[email protected]>

austintlee · 2024-12-17T16:15:41Z

Can you create a test (maybe an IT test) that fails w/o this fix?

brianf-aws · 2024-12-17T20:15:37Z

Can you create a test (maybe an IT test) that fails w/o this fix?

Hey @austintlee, Im a bit confused do you mean create a test that shows this change works? If I create a test that fails. i.e. one that uses the parameters: { ... } it will pass since it now can create the MLInput properly. Also Im thinking if we should actually use the asymmetric model as IT its 283 mb

Instead of a IT i added a UT, as the crux of the problem is in the ingestProcessor not necessarily specific models. I hope you can understand my judgment

Signed-off-by: Brian Flores <[email protected]>

mingshl · 2024-12-17T21:49:58Z

@brianf-aws can you look up the usage of the parameters field and check what are the local model types that are impacted by this change? we know that asymmetric model type is impacted, wondering what other model types are impacted?

then we can test the changes to all impacted local model scenarios.

mingshl · 2024-12-17T22:03:14Z

I added the backport labels to make sure this bug fix can fix all the way back to when the local model is supported in ingest processors on June 11, #2508 in 2.15, please make sure the bwc tests passed in the backport PRs.

Also, would be nice if a second set of eyes can help double check the versions @brianf-aws https://github.com/opensearch-project/ml-commons/commits/2.15

brianf-aws · 2024-12-17T23:18:15Z

@brianf-aws can you look up the usage of the parameters field and check what are the local model types that are impacted by this change? we know that asymmetric model type is impacted, wondering what other model types are impacted?

Good call out Im seeing a pattern where the machineLearning plugin gets invoked by the Node class and gets getNamedXContent() e.g. here.

        NamedXContentRegistry xContentRegistry = new NamedXContentRegistry(
            Stream.of(
                NetworkModule.getNamedXContents().stream(),
                IndicesModule.getNamedXContents().stream(),
                searchModule.getNamedXContents().stream(),
                pluginsService.filterPlugins(Plugin.class).stream().flatMap(p -> p.getNamedXContent().stream()),
                ClusterModule.getNamedXWriteables().stream()
            ).flatMap(Function.identity()).collect(toList())
        );

So it will call the registries as specified here

ml-commons/plugin/src/main/java/org/opensearch/ml/plugin/MachineLearningPlugin.java

Lines 912 to 930 in 67c562a

    
           public List<NamedXContentRegistry.Entry> getNamedXContent() { 
        
               return ImmutableList 
        
                   .of( 
        
                       KMeansParams.XCONTENT_REGISTRY, 
        
                       LinearRegressionParams.XCONTENT_REGISTRY, 
        
                       AnomalyDetectionLibSVMParams.XCONTENT_REGISTRY, 
        
                       SampleAlgoParams.XCONTENT_REGISTRY, 
        
                       FitRCFParams.XCONTENT_REGISTRY, 
        
                       BatchRCFParams.XCONTENT_REGISTRY, 
        
                       LocalSampleCalculatorInput.XCONTENT_REGISTRY, 
        
                       MetricsCorrelationInput.XCONTENT_REGISTRY, 
        
                       AnomalyLocalizationInput.XCONTENT_REGISTRY_ENTRY, 
        
                       RCFSummarizeParams.XCONTENT_REGISTRY, 
        
                       LogisticRegressionParams.XCONTENT_REGISTRY, 
        
                       TextEmbeddingModelConfig.XCONTENT_REGISTRY, 
        
                       AsymmetricTextEmbeddingParameters.XCONTENT_REGISTRY 
        
                   ); 
        
           }

If you click through every registry it will invoke a method that parses and collects fields from the parameter map.

Here is an example for k-means [Albeit not what we think of when we think of a model], What this means is that the ML Processor has the ability to run any models.

ml-commons/common/src/main/java/org/opensearch/ml/common/input/parameter/clustering/KMeansParams.java

Lines 70 to 88 in 684627a

    
           while (parser.nextToken() != XContentParser.Token.END_OBJECT) { 
        
               String fieldName = parser.currentName(); 
        
               parser.nextToken(); 
        
               switch (fieldName) { 
        
                   case CENTROIDS_FIELD: 
        
                       k = parser.intValue(false); 
        
                       break; 
        
                   case ITERATIONS_FIELD: 
        
                       iterations = parser.intValue(false); 
        
                       break; 
        
                   case DISTANCE_TYPE_FIELD: 
        
                       distanceType = DistanceType.from(parser.text()); 
        
                       break; 
        
                   default: 
        
                       parser.skipChildren(); 
        
                       break; 
        
               } 
        
           }

dhrubo-os · 2024-12-17T23:27:45Z

RestBedRockInferenceIT > test_bedrock_multimodal_model FAILED
    org.opensearch.client.ResponseException: method [POST], host [http://127.0.0.1:39501/], URI [/_plugins/_ml/models/_register], status line [HTTP/1.1 500 Internal Server Error]
    {"error":{"root_cause":[{"type":"null_pointer_exception","reason":"Cannot invoke \"org.opensearch.cluster.metadata.MappingMetadata.getSourceAsMap()\" because the return value of \"org.opensearch.cluster.metadata.IndexMetadata.mapping()\" is null"}],"type":"null_pointer_exception","reason":"Cannot invoke \"org.opensearch.cluster.metadata.MappingMetadata.getSourceAsMap()\" because the return value of \"org.opensearch.cluster.metadata.IndexMetadata.mapping()\" is null"},"status":500}

dhrubo-os · 2024-12-18T00:40:13Z

plugin/src/test/java/org/opensearch/ml/processor/MLInferenceIngestProcessorTests.java

+            false,
+            localModelInput
+        );
+        try {


// Act & Assert: Verify NullPointerException and its message NullPointerException exception = assertThrows( NullPointerException.class, () -> processor.execute(ingestDocument, handler), "Expected NullPointerException due to null xContentRegistry" ); assertTrue(exception.getMessage().contains("Cannot invoke"), "Exception message should indicate a failure due to null mlInput");

What do you think about this?

I like this, but the problem is that the exception is passed by the handler its not done by the method itself. So this wouldn't be possible thats the reason why this class and more specifically this method has a catch to make sure that an exception is not possible. i.e. the handler passes an exception only.

dhrubo-os · 2024-12-18T00:42:18Z

plugin/src/test/java/org/opensearch/ml/processor/MLInferenceIngestProcessorTests.java

+     *
+     * @implNote If you check the stack trace of the test you will see it tells you that it's a direct consequence of xContentRegistry being null
+     */
+    public void testExecute_xContentRegistryNullWithLocalModel_throwsException() throws Exception {


Can't we add another test for success case with Asymmetric model?

Yes its possible but that would involve hosting the model zip somewhere so we can test that it makes embeddings.

Can we take some idea from here: https://github.com/opensearch-project/ml-commons/pull/2123/files#diff-cd8365f3263802b111604d62b751e5264a2e51df44b46b9fb650ca001466328a

Oh! I didn't realize the community user did that, I can definitely use that thanks!

brianf-aws requested review from b4sjoo, dhrubo-os, jngz-es, model-collapse, rbhavna, ylwu-amzn, zane-neo, Zhangxunmt, austintlee, HenryL27 and xinyual as code owners December 16, 2024 18:34

brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval December 16, 2024 18:34 — with GitHub Actions Inactive

brianf-aws had a problem deploying to ml-commons-cicd-env-require-approval December 16, 2024 18:34 — with GitHub Actions Failure

This was referenced Dec 16, 2024

[BUG] MLInferenceIngestProcessor has xContentRegistry as null #3276

Open

Tutorial for using Asymmetric models #3258

Open

Adds UT for proving models depend on xContentRegistry for prediction

30316f9

Signed-off-by: Brian Flores <[email protected]>

brianf-aws had a problem deploying to ml-commons-cicd-env-require-approval December 17, 2024 21:34 — with GitHub Actions Failure

apply spotless

f9caf1b

Signed-off-by: Brian Flores <[email protected]>

brianf-aws had a problem deploying to ml-commons-cicd-env-require-approval December 17, 2024 21:41 — with GitHub Actions Failure

mingshl added bug Something isn't working backport 2.x backport 2.18 backport 2.17 backport 2.16 labels Dec 17, 2024

mingshl added backport 2.15 and removed backport 2.15 backport 2.16 labels Dec 17, 2024

mingshl added backport 2.15 backport 2.16 labels Dec 17, 2024

brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval December 17, 2024 23:49 — with GitHub Actions Inactive

brianf-aws had a problem deploying to ml-commons-cicd-env-require-approval December 18, 2024 00:32 — with GitHub Actions Failure

dhrubo-os reviewed Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds preset contentRegistry for IngestProcessors #3281

Adds preset contentRegistry for IngestProcessors #3281

brianf-aws commented Dec 16, 2024 •

edited

Loading

austintlee commented Dec 17, 2024

brianf-aws commented Dec 17, 2024 •

edited

Loading

mingshl commented Dec 17, 2024

mingshl commented Dec 17, 2024

brianf-aws commented Dec 17, 2024

dhrubo-os commented Dec 17, 2024

dhrubo-os Dec 18, 2024

brianf-aws Dec 18, 2024

dhrubo-os Dec 18, 2024

brianf-aws Dec 18, 2024

dhrubo-os Dec 18, 2024

brianf-aws Dec 18, 2024

Adds preset contentRegistry for IngestProcessors #3281

Are you sure you want to change the base?

Adds preset contentRegistry for IngestProcessors #3281

Conversation

brianf-aws commented Dec 16, 2024 • edited Loading

Background

Context

Related Issues

Check List

austintlee commented Dec 17, 2024

brianf-aws commented Dec 17, 2024 • edited Loading

mingshl commented Dec 17, 2024

mingshl commented Dec 17, 2024

brianf-aws commented Dec 17, 2024

dhrubo-os commented Dec 17, 2024

dhrubo-os Dec 18, 2024

Choose a reason for hiding this comment

brianf-aws Dec 18, 2024

Choose a reason for hiding this comment

dhrubo-os Dec 18, 2024

Choose a reason for hiding this comment

brianf-aws Dec 18, 2024

Choose a reason for hiding this comment

dhrubo-os Dec 18, 2024

Choose a reason for hiding this comment

brianf-aws Dec 18, 2024

Choose a reason for hiding this comment

brianf-aws commented Dec 16, 2024 •

edited

Loading

brianf-aws commented Dec 17, 2024 •

edited

Loading