[RFC] Support One to One Inference in ML Inference Search Response Processor #2879

mingshl · 2024-09-03T20:15:22Z

Is your feature request related to a problem?

Problem Statement

The current implementation of the ML Inference Search Response Processor in OpenSearch 2.16 supports many-to-one inference, where multiple documents are collected into a list and sent as a single prediction request to the machine learning model. However, there are scenarios where users may want to perform one-to-one inference, where each document is sent as a separate prediction request to the model.

Some use cases for one-to-one inference include:

Reranking: In reranking scenarios, such as using XGBoost for ranking, the model typically takes a single document and compares it with the search string to return a single score. Sending multiple documents in a single request may not be suitable for such use cases.

Models with Single Input: Some machine learning models, like the Bedrock embedding model, accept a single string as input. In such cases, sending multiple documents in a single request may not be compatible with the model's input requirements.

Customized Inference Logic: There may be scenarios where users need to perform customized inference logic on each document individually, which may not be possible with the many-to-one approach.

Solution Proposal

To address the need for one-to-one inference, we propose adding a new configuration option one_to_one to the ML Inference Search Response Processor. This option will allow users to specify whether they want to perform many-to-one inference (the current default behavior) or one-to-one inference.

When one_to_one is set to true, the processor will handle the search response as follows:

Separate the search response into individual one-hit search responses, where each response contains a single document.
For each one-hit search response, create a separate prediction request and send it to the machine learning model.
After receiving the prediction results for each document, combine the individual responses back into a single search response with the updated documents.
This approach ensures that each document is processed individually by the machine learning model, enabling support for use cases like reranking and models that accept single inputs.

What solution would you like?

The proposed solution will involve the following changes:

Modify the MLInferenceSearchResponseProcessor class to introduce the one_to_one configuration option and handle the logic for separating and combining search responses.
Update the processResponseAsync method to handle the one-to-one inference flow, including creating individual prediction requests and combining the results.
Introduce new helper methods or classes as needed to facilitate the separation and combination of search responses.
Update the documentation and examples to reflect the new one_to_one configuration option and its usage.
By implementing this solution, users will have the flexibility to choose between many-to-one inference (the current default behavior) and one-to-one inference, depending on their specific use case and model requirements.

Do you have any additional context?
META Issue](#2839)
[RFC for ML Inference Processors] #2173

The text was updated successfully, but these errors were encountered:

mingshl · 2024-09-03T21:04:02Z

#2801

mingshl · 2024-09-24T17:50:31Z

released in OS 2.17, closing.

mingshl added enhancement New feature or request untriaged 2.17 and removed untriaged labels Sep 3, 2024

mingshl self-assigned this Sep 3, 2024

github-actions bot added the untriaged label Sep 3, 2024

mingshl removed the untriaged label Sep 3, 2024

mingshl mentioned this issue Sep 3, 2024

Support one_to_one in ML Inference Search Response Processor #2801

Merged

5 tasks

ylwu-amzn added this to ml-commons projects Sep 10, 2024

mingshl moved this to Done in ml-commons projects Sep 24, 2024

mingshl closed this as completed Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Support One to One Inference in ML Inference Search Response Processor #2879

[RFC] Support One to One Inference in ML Inference Search Response Processor #2879

mingshl commented Sep 3, 2024 •

edited

Loading

mingshl commented Sep 3, 2024

mingshl commented Sep 24, 2024

[RFC] Support One to One Inference in ML Inference Search Response Processor #2879

[RFC] Support One to One Inference in ML Inference Search Response Processor #2879

Comments

mingshl commented Sep 3, 2024 • edited Loading

Problem Statement

Some use cases for one-to-one inference include:

Solution Proposal

The proposed solution will involve the following changes:

mingshl commented Sep 3, 2024

mingshl commented Sep 24, 2024

mingshl commented Sep 3, 2024 •

edited

Loading