Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support conversional search in ML Inference Search Response Processor with memory #3242

Open
mingshl opened this issue Nov 27, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@mingshl
Copy link
Collaborator

mingshl commented Nov 27, 2024

Is your feature request related to a problem?
to support conversational search, when sending the request to the remote model, we not only needs to send the questions, but also the historical context.

For example,
OpenAI API:

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello, how are you?"},
    {"role": "assistant", "content": "I'm doing well, thank you. How can I assist you today?"},
    {"role": "user", "content": "What's the weather like?"}
]

Bedrock converse API:

"messages": [
        {
            "role": "user",
            "content": [
                {
                    "text": "Write an article about impact of high inflation to GDP of a country"
                }
            ]
        }
    ]

in ml inference search response processor, introduce a new parameter, "conversational_search", to be true or false. when it's true, and the input_map config to read the memory id from query extension. ml inference processors will read the memory from GetConversationsRequest action, try to send the message list and questions together to the remote model api.

{
 "ml_inference": {
   "model_id": "<model_id>", 
   "conversational_search": true
   "function_name": "<function_name>",
   "full_response_path": "<full_response_path>",
  "conversation_search": { "memory_input":"""{"message": {“role”: “${parameters.role}”, “content”: 
                                         “${parameters.content}”}}""" // optional , 
   "memory_output": """"message": {"role": "assistant", "content": "${DataAsMap.response}"}}"""
  }
   "model_config":{
     "<model_config_field>": "<config_value>"
   },
   "model_input": "<model_input>",
   "input_map": [
     {
       "memory_id": "$._query.ext.ml_inference.memory_id"
       "content": "$._query.ext.ml_inference.question"
     }
   ],
   "output_map": [
     {
       "<new_document_field>": "<model_output_field>"
     }
   ],
   "override": "<override>",
   "one_to_one": false
 }
}

when search in query, users can use the ml inference search extension to ask question.

GET /my_rag_test_data/_search?search_pipeline=rag_pipeline
{
  "query": {
    "match": {
      "text": "Abraham Lincoln"
    }
  },
  "ext": {
    "ml_inference": {
      "llm_question": "Was Abraham Lincoln a good politician",
      "memory_id": "iXC4bI0BfUsSoeNTjS30"
    }
  }
}

To reuse the current memory and message API, propose to add a new field in interaction and message API to allow custom message.

propose new interface for message, interaction:

@input 
structure CreateInteractionInput  {
    @required
    @httpLabel
    conversationId: ConversationId
    input: String
    prompt: String
    response: String
    agent: String,
    customMessage: Object
    attributes: InteractionAttributes
}

What alternatives have you considered?
A clear and concise description of any alternative solutions or features you've considered.

Do you have any additional context?
[[Add any other context or screenshots about the feature request here.](https://github.com//issues/1150)
](#1877)

@mingshl mingshl added enhancement New feature or request untriaged labels Nov 27, 2024
@Distorted-Dundar
Copy link

Hey,

Is the memory id an index? If so what are some responsibilities of this index it looks like the ml inference processor will load the index and store info in the index on return is there a restriction on this memory id or can any other index be used?

How do we know we should clean up these memories after sometime?

@austintlee
Copy link
Collaborator

I worry that this might blur the line between the ML response processor and the existing RAG processor. We may be adding too much to the ML inference processor interface.

@mingshl
Copy link
Collaborator Author

mingshl commented Nov 29, 2024

I worry that this might blur the line between the ML response processor and the existing RAG processor. We may be adding too much to the ML inference processor interface.

Hi @austintlee , we are working on OpenSearch Flow Project, you can refer here https://github.com/opensearch-project/dashboards-flow-framework/blob/main/documentation/tutorial.md for the tutorial.

This OpenSearch Flow is aiming at using ML Inference Processors (ingest/search) as a generic processor to run inference during ingest and search in a workflow to simplify set up and configurations. Of course, if users are familiar with Rag processors or others existing processors, users can use other processors as well, there are drop down options in the processors option that user can pick and bindle, it's up to the users choice for their use cases.

@mingshl
Copy link
Collaborator Author

mingshl commented Nov 29, 2024

Hey,

Is the memory id an index? If so what are some responsibilities of this index it looks like the ml inference processor will load the index and store info in the index on return is there a restriction on this memory id or can any other index be used?

How do we know we should clean up these memories after sometime?

yes the memory would be story in index. Indeed the memory and message API were releases. checkout this doc https://opensearch.org/docs/latest/ml-commons-plugin/api/memory-apis/get-memory/ https://opensearch.org/docs/latest/ml-commons-plugin/api/memory-apis/get-message/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Progress
Development

No branches or pull requests

3 participants