-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds preset contentRegistry for IngestProcessors #3281
base: main
Are you sure you want to change the base?
Adds preset contentRegistry for IngestProcessors #3281
Conversation
…d local models Curently local models that use the parameters map within the payload to create a request can not create objects to be used for local model prediction. This requires a opensearch core change because it needs the contentRegistry,however given there is not much dependency on the registry (currently) we can give it the preset registry given in the MachineLearningPlugin class vai the getNamedXContent() class Signed-off-by: Brian Flores <[email protected]>
Can you create a test (maybe an IT test) that fails w/o this fix? |
Instead of a IT i added a UT, as the crux of the problem is in the ingestProcessor not necessarily specific models. I hope you can understand my judgment |
Signed-off-by: Brian Flores <[email protected]>
Signed-off-by: Brian Flores <[email protected]>
@brianf-aws can you look up the usage of the parameters field and check what are the local model types that are impacted by this change? we know that asymmetric model type is impacted, wondering what other model types are impacted? then we can test the changes to all impacted local model scenarios. |
I added the backport labels to make sure this bug fix can fix all the way back to when the local model is supported in ingest processors on June 11, #2508 in 2.15, please make sure the bwc tests passed in the backport PRs. Also, would be nice if a second set of eyes can help double check the versions @brianf-aws https://github.com/opensearch-project/ml-commons/commits/2.15 |
Good call out Im seeing a pattern where the machineLearning plugin gets invoked by the Node class and gets NamedXContentRegistry xContentRegistry = new NamedXContentRegistry(
Stream.of(
NetworkModule.getNamedXContents().stream(),
IndicesModule.getNamedXContents().stream(),
searchModule.getNamedXContents().stream(),
pluginsService.filterPlugins(Plugin.class).stream().flatMap(p -> p.getNamedXContent().stream()),
ClusterModule.getNamedXWriteables().stream()
).flatMap(Function.identity()).collect(toList())
); So it will call the registries as specified here ml-commons/plugin/src/main/java/org/opensearch/ml/plugin/MachineLearningPlugin.java Lines 912 to 930 in 67c562a
If you click through every registry it will invoke a method that parses and collects fields from the parameter map. Here is an example for k-means [Albeit not what we think of when we think of a model], What this means is that the ML Processor has the ability to run any models. Lines 70 to 88 in 684627a
|
|
false, | ||
localModelInput | ||
); | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Act & Assert: Verify NullPointerException and its message
NullPointerException exception = assertThrows(
NullPointerException.class,
() -> processor.execute(ingestDocument, handler),
"Expected NullPointerException due to null xContentRegistry"
);
assertTrue(exception.getMessage().contains("Cannot invoke"),
"Exception message should indicate a failure due to null mlInput");
What do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, but the problem is that the exception is passed by the handler its not done by the method itself. So this wouldn't be possible thats the reason why this class and more specifically this method has a catch to make sure that an exception is not possible. i.e. the handler passes an exception only.
* | ||
* @implNote If you check the stack trace of the test you will see it tells you that it's a direct consequence of xContentRegistry being null | ||
*/ | ||
public void testExecute_xContentRegistryNullWithLocalModel_throwsException() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we add another test for success case with Asymmetric model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes its possible but that would involve hosting the model zip somewhere so we can test that it makes embeddings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I didn't realize the community user did that, I can definitely use that thanks!
Background
Currently local models that use the parameters map within the payload to create a request can not create objects to be used for local model prediction, when using the MLInferenceIngestProcessor.
e.g.
This requires a opensearch core change because it needs the contentRegistry,however given there is not much dependency on the registry (currently) we can give it the preset registry given in the MachineLearningPlugin class via the getNamedXContent() class to temporarily unblock this use case while a OpenSearch Core fix gives a proper change.
Context
Please see this issue #3276.
Related Issues
Temporarily resolves #3276 while waiting for a OpenSearch core fix.
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.