You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation of the ML Inference Search Response Processor in OpenSearch 2.16 supports many-to-one inference, where multiple documents are collected into a list and sent as a single prediction request to the machine learning model. However, there are scenarios where users may want to perform one-to-one inference, where each document is sent as a separate prediction request to the model.
Some use cases for one-to-one inference include:
Reranking: In reranking scenarios, such as using XGBoost for ranking, the model typically takes a single document and compares it with the search string to return a single score. Sending multiple documents in a single request may not be suitable for such use cases.
Models with Single Input: Some machine learning models, like the Bedrock embedding model, accept a single string as input. In such cases, sending multiple documents in a single request may not be compatible with the model's input requirements.
Customized Inference Logic: There may be scenarios where users need to perform customized inference logic on each document individually, which may not be possible with the many-to-one approach.
Solution Proposal
To address the need for one-to-one inference, we propose adding a new configuration option one_to_one to the ML Inference Search Response Processor. This option will allow users to specify whether they want to perform many-to-one inference (the current default behavior) or one-to-one inference.
When one_to_one is set to true, the processor will handle the search response as follows:
Separate the search response into individual one-hit search responses, where each response contains a single document.
For each one-hit search response, create a separate prediction request and send it to the machine learning model.
After receiving the prediction results for each document, combine the individual responses back into a single search response with the updated documents.
This approach ensures that each document is processed individually by the machine learning model, enabling support for use cases like reranking and models that accept single inputs.
What solution would you like?
The proposed solution will involve the following changes:
Modify the MLInferenceSearchResponseProcessor class to introduce the one_to_one configuration option and handle the logic for separating and combining search responses.
Update the processResponseAsync method to handle the one-to-one inference flow, including creating individual prediction requests and combining the results.
Introduce new helper methods or classes as needed to facilitate the separation and combination of search responses.
Update the documentation and examples to reflect the new one_to_one configuration option and its usage.
By implementing this solution, users will have the flexibility to choose between many-to-one inference (the current default behavior) and one-to-one inference, depending on their specific use case and model requirements.
Do you have any additional context? META Issue](#2839)
[RFC for ML Inference Processors] #2173
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem?
Problem Statement
The current implementation of the ML Inference Search Response Processor in OpenSearch 2.16 supports many-to-one inference, where multiple documents are collected into a list and sent as a single prediction request to the machine learning model. However, there are scenarios where users may want to perform one-to-one inference, where each document is sent as a separate prediction request to the model.
Some use cases for one-to-one inference include:
Reranking: In reranking scenarios, such as using XGBoost for ranking, the model typically takes a single document and compares it with the search string to return a single score. Sending multiple documents in a single request may not be suitable for such use cases.
Models with Single Input: Some machine learning models, like the Bedrock embedding model, accept a single string as input. In such cases, sending multiple documents in a single request may not be compatible with the model's input requirements.
Customized Inference Logic: There may be scenarios where users need to perform customized inference logic on each document individually, which may not be possible with the many-to-one approach.
Solution Proposal
To address the need for one-to-one inference, we propose adding a new configuration option
one_to_one
to the ML Inference Search Response Processor. This option will allow users to specify whether they want to perform many-to-one inference (the current default behavior) or one-to-one inference.When one_to_one is set to true, the processor will handle the search response as follows:
Separate the search response into individual one-hit search responses, where each response contains a single document.
For each one-hit search response, create a separate prediction request and send it to the machine learning model.
After receiving the prediction results for each document, combine the individual responses back into a single search response with the updated documents.
This approach ensures that each document is processed individually by the machine learning model, enabling support for use cases like reranking and models that accept single inputs.
What solution would you like?
The proposed solution will involve the following changes:
Modify the MLInferenceSearchResponseProcessor class to introduce the one_to_one configuration option and handle the logic for separating and combining search responses.
Update the processResponseAsync method to handle the one-to-one inference flow, including creating individual prediction requests and combining the results.
Introduce new helper methods or classes as needed to facilitate the separation and combination of search responses.
Update the documentation and examples to reflect the new one_to_one configuration option and its usage.
By implementing this solution, users will have the flexibility to choose between many-to-one inference (the current default behavior) and one-to-one inference, depending on their specific use case and model requirements.
Do you have any additional context?
META Issue](#2839)
[RFC for ML Inference Processors] #2173
The text was updated successfully, but these errors were encountered: