[Observability AI Assistant] Fully migrate to inference client #197630

dgieselaar · 2024-10-24T12:46:26Z

We currently use the inference client in the NL-to-ESQL task. We should fully migrate to it, which means that we replace all instances of client.chatComplete() and client.chat() with inferenceClient.chatComplete() and inferenceClient.output(), as this would mean less maintenance for us, and we have a single place in Kibana where we handle and can improve LLM interactions.

Dependencies

Give feedback

[inference] Add support for request cancelation #200757

Team:AI Infra
[inference] add support for openAI native stream token count #200745

Team:AI Infra backport:prev-minor release_note:skip v8.17.0 v9.0.0
Options

The text was updated successfully, but these errors were encountered:

elasticmachine · 2024-10-24T12:46:28Z

Pinging @elastic/obs-ai-assistant (Team:Obs AI Assistant)

dgieselaar · 2024-10-31T09:30:26Z

This is the inferenceClient:

kibana/x-pack/plugins/inference/server/types.ts

Line 29 in 56730e8

export interface InferenceClient {

You can get one from the InferencePlugin's start contract:

kibana/x-pack/plugins/inference/server/plugin.ts

Line 51 in 5c298a1

getClient: ({ request }) => {

inferenceClient.chatComplete is simialr to observabilityAIAssistantClient.chat - observabilityAIAssistantClient.complete does a bunch of stuff on top of chat. Let's focus first on replacing our usage of observabilityAIAssistantClient.chat with inferenceClient.chat. I think we can keep observabilityAIAssistantClient.chat as a wrapper for now because it also adds instrumentation and logging.

After making this change, we should be able to remove all the adapters: https://github.com/elastic/kibana/tree/main/x-pack/plugins/observability_solution/observability_ai_assistant/server/service/client/adapters

arturoliduena · 2024-11-06T09:33:55Z

@dgieselaar Thank you for the detailed information on migrating to inferenceClient. I have some questions regarding the message type mapping to ensure compatibility:

Mapping Message Types:
- In the Observability AI Assistant, we have a Message interface with roles like System, Assistant, User, Function, and Elastic.

kibana/x-pack/plugins/observability_solution/observability_ai_assistant/common/types.ts

Lines 13 to 19 in bde5e11

    
           export enum MessageRole { 
        
             System = 'system', 
        
             Assistant = 'assistant', 
        
             User = 'user', 
        
             Function = 'function', 
        
             Elastic = 'elastic', 
        
           }

kibana/x-pack/plugins/observability_solution/observability_ai_assistant/common/types.ts

Lines 33 to 46 in bde5e11

    
           export interface Message { 
        
             '@timestamp': string; 
        
             message: { 
        
               content?: string; 
        
               name?: string; 
        
               role: MessageRole; 
        
               function_call?: { 
        
                 name: string; 
        
                 arguments?: string; 
        
                 trigger: MessageRole.Assistant | MessageRole.User | MessageRole.Elastic; 
        
               }; 
        
               data?: string; 
        
             }; 
        
           }

The inferenceClient uses a different message structure with roles User, Assistant, and Tool.

kibana/x-pack/packages/ai-infra/inference-common/src/chat_complete/messages.ts

Lines 13 to 17 in bde5e11

    
           export enum MessageRole { 
        
             User = 'user', 
        
             Assistant = 'assistant', 
        
             Tool = 'tool', 
        
           }

with each role having specific types:

kibana/x-pack/packages/ai-infra/inference-common/src/chat_complete/messages.ts

Line 75 in bde5e11

export type Message = UserMessage | AssistantMessage | ToolMessage<unknown>;

User and Assistant roles in Observability AI Assistant. These map directly to UserMessage and AssistantMessage in InferenceClient. We can use the content field as-is.
how logically mapping Elastic, System, Function, messages to the Tool role in InferenceClient, This role does not have a direct counterpart in InferenceClient. Should we consider it as a Tool message, or should we handle it differently?

Function Calling:
- For inferenceClient.chatComplete, we’re setting functionCalling as 'simulated' or 'native' based on the simulateFunctionCalling flag. Does this align with the intended design for handling function calls?
Removing Adapters:
- Once the migration to inferenceClient is confirmed, is there a specific process for removing the adapters, or would it be safe to remove them once the functionality is verified?

dgieselaar · 2024-11-06T12:26:05Z

User and Assistant roles in Observability AI Assistant. These map directly to UserMessage and AssistantMessage in InferenceClient. We can use the content field as-is.
how logically mapping Elastic, System, Function, messages to the Tool role in InferenceClient, This role does not have a direct counterpart in InferenceClient. Should we consider it as a Tool message, or should we handle it differently?

We have an convertMessagesForInference function for this, I think:

kibana/x-pack/plugins/observability_solution/observability_ai_assistant_app/common/convert_messages_for_inference.ts

Line 16 in 631ccb0

    
           export function convertMessagesForInference(messages: Message[]): InferenceMessage[] {

. (We use this in the query function).

For inferenceClient.chatComplete, we’re setting functionCalling as 'simulated' or 'native' based on the simulateFunctionCalling flag. Does this align with the intended design for handling function calls?

Not sure if I get this question, WDYM with the "intended design for handling function calls"? Just to clarify, it's just a different way of setting the flag, they use the same mechanism (the implementation in the inference plugin was ported over from our plugin).

Once the migration to inferenceClient is confirmed, is there a specific process for removing the adapters, or would it be safe to remove them once the functionality is verified?

We can just delete them I think, with one caveat: I think we hardcode simulated function calling for Bedrock and Gemini, that should no longer be necessary - the Inference plugin has an implementation (thanks to @pgayvallet) that delegates to native function calling for both Bedrock & Gemini.

pgayvallet · 2024-11-06T13:45:50Z

Yeah that's correct. The inference APIs have a functionCalling parameter, but in practice atm passing simulated is only used for the openAI connector - both bedrock and gemini always use native function calling. If we really wanted, I could wire SFC for bedrock and gemini, but I'm not sure it would really be useful. Anyway - yes, you probably don't need to manage simulated function calling manually if you switch to using the inference APIs

arturoliduena · 2024-11-12T09:48:20Z

thanks @pgayvallet and @dgieselaar for the clarification. I’m following the documentation to get token count information from the streaming mode. I’ve set stream: true to enable streaming, expecting to get the token object in the event chatCompletionMessage or in the emit of event type ChatCompletionEventType.ChatCompletionTokenCount. However, I’m not seeing the token count in this mode.

this is the example from the documentation:

const chatResponse = inferenceClient.chatComplete({
  connectorId: 'some-gen-ai-connector',
  system: `Here is my system message`,
  messages: [
    {
      role: MessageRole.User,
      content: 'Do something',
    },
  ],
});

const { content, tokens } = chatResponse;
// do something with the output

Could you clarify if the token count is accessible directly in streaming mode, or should I handle it differently? If possible, an example of how to access token count within the streaming mode would be helpful.

arturoliduena · 2024-11-12T09:54:18Z

Additionally, I’d like to know how to properly abort a chat completion request. Is there a recommended approach for canceling an in-progress chat in streaming mode, or any specific method provided in the inference client for handling this?

pgayvallet · 2024-11-19T11:43:08Z

I’d like to know how to properly abort a chat completion request

AFAIK there's no way to abort a streaming request atm. Not at the inference client's level for sure, but most importantly, not at the connector's level.

We could eventually expose that at the inference client's level via some abort controller pattern, but without being able to properly cancel the streamed action at the connector's level, I'm not sure it would be very useful in practice in terms of perf / resource release gain.

Still, if that's something that o11y would want, we could expose that API, even if in practice it would basically just complete the observable on call, and see later if we can wire it properly to perform "real" cancelation at the lower layers.

EDIT: actually seems like the connector does support passing an abort signal

kibana/x-pack/plugins/stack_connectors/server/connector_types/openai/openai.ts

Lines 276 to 280 in 9372027

    
           public async invokeStream( 
        
             body: InvokeAIActionParams, 
        
             connectorUsageCollector: ConnectorUsageCollector 
        
           ): Promise<PassThrough> { 
        
             const { signal, timeout, ...rest } = body;

So we can probably leverage that

pgayvallet · 2024-11-19T15:18:57Z

I opened #200757 to talk about request cancelation. Insight very welcome

… (elastic#199286) ## Summary Closes elastic#183245 Closes elastic#197630 [Observability AI Assistant] Partially migrate to inference client replacing `inferenceClient.chatComplete` to `observabilityAIAssistantClient.chat` - `observabilityAIAssistantClient.complete` does a bunch of stuff on top of `chat`. keepping `observabilityAIAssistantClient.chat` as a wrapper for now because it also adds instrumentation and logging. (cherry picked from commit df0dfa5)

#199286) (#203399) # Backport This will backport the following commits from `main` to `8.x`: - [[Observability AI Assistant] migrate to inference client #197630 (#199286)](#199286)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)  Co-authored-by: Arturo Lidueña <[email protected]>

… (elastic#199286) ## Summary Closes elastic#183245 Closes elastic#197630 [Observability AI Assistant] Partially migrate to inference client replacing `inferenceClient.chatComplete` to `observabilityAIAssistantClient.chat` - `observabilityAIAssistantClient.complete` does a bunch of stuff on top of `chat`. keepping `observabilityAIAssistantClient.chat` as a wrapper for now because it also adds instrumentation and logging.

dgieselaar added the Team:Obs AI Assistant Observability AI Assistant label Oct 24, 2024

arturoliduena self-assigned this Oct 31, 2024

arturoliduena mentioned this issue Nov 7, 2024

[Observability AI Assistant] migrate to inference client #197630 #199286

Merged

arturoliduena closed this as completed in #199286 Dec 9, 2024

arturoliduena closed this as completed in df0dfa5 Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Observability AI Assistant] Fully migrate to inference client #197630

[Observability AI Assistant] Fully migrate to inference client #197630

dgieselaar commented Oct 24, 2024 •

edited by arturoliduena

Loading

Dependencies

elasticmachine commented Oct 24, 2024

dgieselaar commented Oct 31, 2024

arturoliduena commented Nov 6, 2024

dgieselaar commented Nov 6, 2024

pgayvallet commented Nov 6, 2024

arturoliduena commented Nov 12, 2024

arturoliduena commented Nov 12, 2024

pgayvallet commented Nov 19, 2024 •

edited

Loading

pgayvallet commented Nov 19, 2024

[Observability AI Assistant] Fully migrate to inference client #197630

[Observability AI Assistant] Fully migrate to inference client #197630

Comments

dgieselaar commented Oct 24, 2024 • edited by arturoliduena Loading

Dependencies

elasticmachine commented Oct 24, 2024

dgieselaar commented Oct 31, 2024

arturoliduena commented Nov 6, 2024

dgieselaar commented Nov 6, 2024

pgayvallet commented Nov 6, 2024

arturoliduena commented Nov 12, 2024

arturoliduena commented Nov 12, 2024

pgayvallet commented Nov 19, 2024 • edited Loading

pgayvallet commented Nov 19, 2024

dgieselaar commented Oct 24, 2024 •

edited by arturoliduena

Loading

pgayvallet commented Nov 19, 2024 •

edited

Loading