Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Observability AI Assistant] Fully migrate to inference client #197630

Closed
2 tasks done
dgieselaar opened this issue Oct 24, 2024 · 9 comments · Fixed by #199286
Closed
2 tasks done

[Observability AI Assistant] Fully migrate to inference client #197630

dgieselaar opened this issue Oct 24, 2024 · 9 comments · Fixed by #199286
Assignees
Labels
Team:Obs AI Assistant Observability AI Assistant

Comments

@dgieselaar
Copy link
Member

dgieselaar commented Oct 24, 2024

We currently use the inference client in the NL-to-ESQL task. We should fully migrate to it, which means that we replace all instances of client.chatComplete() and client.chat() with inferenceClient.chatComplete() and inferenceClient.output(), as this would mean less maintenance for us, and we have a single place in Kibana where we handle and can improve LLM interactions.

Dependencies

Preview Give feedback
  1. Team:AI Infra
  2. Team:AI Infra backport:prev-minor release_note:skip v8.17.0 v9.0.0
@dgieselaar dgieselaar added the Team:Obs AI Assistant Observability AI Assistant label Oct 24, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ai-assistant (Team:Obs AI Assistant)

@arturoliduena arturoliduena self-assigned this Oct 31, 2024
@dgieselaar
Copy link
Member Author

This is the inferenceClient:

export interface InferenceClient {

You can get one from the InferencePlugin's start contract:

getClient: ({ request }) => {

inferenceClient.chatComplete is simialr to observabilityAIAssistantClient.chat - observabilityAIAssistantClient.complete does a bunch of stuff on top of chat. Let's focus first on replacing our usage of observabilityAIAssistantClient.chat with inferenceClient.chat. I think we can keep observabilityAIAssistantClient.chat as a wrapper for now because it also adds instrumentation and logging.

After making this change, we should be able to remove all the adapters: https://github.com/elastic/kibana/tree/main/x-pack/plugins/observability_solution/observability_ai_assistant/server/service/client/adapters

@arturoliduena
Copy link
Contributor

@dgieselaar Thank you for the detailed information on migrating to inferenceClient. I have some questions regarding the message type mapping to ensure compatibility:

  1. Mapping Message Types:
    • In the Observability AI Assistant, we have a Message interface with roles like System, Assistant, User, Function, and Elastic.

export enum MessageRole {
System = 'system',
Assistant = 'assistant',
User = 'user',
Function = 'function',
Elastic = 'elastic',
}

export interface Message {
'@timestamp': string;
message: {
content?: string;
name?: string;
role: MessageRole;
function_call?: {
name: string;
arguments?: string;
trigger: MessageRole.Assistant | MessageRole.User | MessageRole.Elastic;
};
data?: string;
};
}

with each role having specific types:

export type Message = UserMessage | AssistantMessage | ToolMessage<unknown>;

  • User and Assistant roles in Observability AI Assistant. These map directly to UserMessage and AssistantMessage in InferenceClient. We can use the content field as-is.
  • how logically mapping Elastic, System, Function, messages to the Tool role in InferenceClient, This role does not have a direct counterpart in InferenceClient. Should we consider it as a Tool message, or should we handle it differently?
  1. Function Calling:

    • For inferenceClient.chatComplete, we’re setting functionCalling as 'simulated' or 'native' based on the simulateFunctionCalling flag. Does this align with the intended design for handling function calls?
  2. Removing Adapters:

    • Once the migration to inferenceClient is confirmed, is there a specific process for removing the adapters, or would it be safe to remove them once the functionality is verified?

@dgieselaar
Copy link
Member Author

User and Assistant roles in Observability AI Assistant. These map directly to UserMessage and AssistantMessage in InferenceClient. We can use the content field as-is.
how logically mapping Elastic, System, Function, messages to the Tool role in InferenceClient, This role does not have a direct counterpart in InferenceClient. Should we consider it as a Tool message, or should we handle it differently?

We have an convertMessagesForInference function for this, I think:

export function convertMessagesForInference(messages: Message[]): InferenceMessage[] {
. (We use this in the query function).

For inferenceClient.chatComplete, we’re setting functionCalling as 'simulated' or 'native' based on the simulateFunctionCalling flag. Does this align with the intended design for handling function calls?

Not sure if I get this question, WDYM with the "intended design for handling function calls"? Just to clarify, it's just a different way of setting the flag, they use the same mechanism (the implementation in the inference plugin was ported over from our plugin).

Once the migration to inferenceClient is confirmed, is there a specific process for removing the adapters, or would it be safe to remove them once the functionality is verified?

We can just delete them I think, with one caveat: I think we hardcode simulated function calling for Bedrock and Gemini, that should no longer be necessary - the Inference plugin has an implementation (thanks to @pgayvallet) that delegates to native function calling for both Bedrock & Gemini.

@pgayvallet
Copy link
Contributor

Yeah that's correct. The inference APIs have a functionCalling parameter, but in practice atm passing simulated is only used for the openAI connector - both bedrock and gemini always use native function calling. If we really wanted, I could wire SFC for bedrock and gemini, but I'm not sure it would really be useful. Anyway - yes, you probably don't need to manage simulated function calling manually if you switch to using the inference APIs

@arturoliduena
Copy link
Contributor

thanks @pgayvallet and @dgieselaar for the clarification. I’m following the documentation to get token count information from the streaming mode. I’ve set stream: true to enable streaming, expecting to get the token object in the event chatCompletionMessage or in the emit of event type ChatCompletionEventType.ChatCompletionTokenCount. However, I’m not seeing the token count in this mode.

this is the example from the documentation:

const chatResponse = inferenceClient.chatComplete({
  connectorId: 'some-gen-ai-connector',
  system: `Here is my system message`,
  messages: [
    {
      role: MessageRole.User,
      content: 'Do something',
    },
  ],
});

const { content, tokens } = chatResponse;
// do something with the output

Could you clarify if the token count is accessible directly in streaming mode, or should I handle it differently? If possible, an example of how to access token count within the streaming mode would be helpful.

@arturoliduena
Copy link
Contributor

Additionally, I’d like to know how to properly abort a chat completion request. Is there a recommended approach for canceling an in-progress chat in streaming mode, or any specific method provided in the inference client for handling this?

@pgayvallet
Copy link
Contributor

pgayvallet commented Nov 19, 2024

I’d like to know how to properly abort a chat completion request

AFAIK there's no way to abort a streaming request atm. Not at the inference client's level for sure, but most importantly, not at the connector's level.

We could eventually expose that at the inference client's level via some abort controller pattern, but without being able to properly cancel the streamed action at the connector's level, I'm not sure it would be very useful in practice in terms of perf / resource release gain.

Still, if that's something that o11y would want, we could expose that API, even if in practice it would basically just complete the observable on call, and see later if we can wire it properly to perform "real" cancelation at the lower layers.

EDIT: actually seems like the connector does support passing an abort signal

public async invokeStream(
body: InvokeAIActionParams,
connectorUsageCollector: ConnectorUsageCollector
): Promise<PassThrough> {
const { signal, timeout, ...rest } = body;

So we can probably leverage that

@pgayvallet
Copy link
Contributor

I opened #200757 to talk about request cancelation. Insight very welcome

kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Dec 9, 2024
… (elastic#199286)

## Summary

Closes elastic#183245

Closes elastic#197630
[Observability AI Assistant] Partially migrate to inference client

replacing `inferenceClient.chatComplete` to
`observabilityAIAssistantClient.chat` -
`observabilityAIAssistantClient.complete` does a bunch of stuff on top
of `chat`. keepping `observabilityAIAssistantClient.chat` as a wrapper
for now because it also adds instrumentation and logging.

(cherry picked from commit df0dfa5)
kibanamachine added a commit that referenced this issue Dec 9, 2024
#199286) (#203399)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[Observability AI Assistant] migrate to inference client #197630
(#199286)](#199286)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Arturo
Lidueña","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-12-09T11:46:31Z","message":"[Observability
AI Assistant] migrate to inference client #197630 (#199286)\n\n##
Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/183245\r\n\r\nCloses #197630
\r\n[Observability AI Assistant] Partially migrate to inference client
\r\n\r\nreplacing `inferenceClient.chatComplete`
to\r\n`observabilityAIAssistantClient.chat`
-\r\n`observabilityAIAssistantClient.complete` does a bunch of stuff on
top\r\nof `chat`. keepping `observabilityAIAssistantClient.chat` as a
wrapper\r\nfor now because it also adds instrumentation and
logging.","sha":"df0dfa5216c1d1d3b72a30b3b1d903781db90615","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","Team:Obs
AI
Assistant","ci:project-deploy-observability","backport:version","v8.18.0"],"title":"[Observability
AI Assistant] migrate to inference client
#197630","number":199286,"url":"https://github.com/elastic/kibana/pull/199286","mergeCommit":{"message":"[Observability
AI Assistant] migrate to inference client #197630 (#199286)\n\n##
Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/183245\r\n\r\nCloses #197630
\r\n[Observability AI Assistant] Partially migrate to inference client
\r\n\r\nreplacing `inferenceClient.chatComplete`
to\r\n`observabilityAIAssistantClient.chat`
-\r\n`observabilityAIAssistantClient.complete` does a bunch of stuff on
top\r\nof `chat`. keepping `observabilityAIAssistantClient.chat` as a
wrapper\r\nfor now because it also adds instrumentation and
logging.","sha":"df0dfa5216c1d1d3b72a30b3b1d903781db90615"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/199286","number":199286,"mergeCommit":{"message":"[Observability
AI Assistant] migrate to inference client #197630 (#199286)\n\n##
Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/183245\r\n\r\nCloses #197630
\r\n[Observability AI Assistant] Partially migrate to inference client
\r\n\r\nreplacing `inferenceClient.chatComplete`
to\r\n`observabilityAIAssistantClient.chat`
-\r\n`observabilityAIAssistantClient.complete` does a bunch of stuff on
top\r\nof `chat`. keepping `observabilityAIAssistantClient.chat` as a
wrapper\r\nfor now because it also adds instrumentation and
logging.","sha":"df0dfa5216c1d1d3b72a30b3b1d903781db90615"}},{"branch":"8.x","label":"v8.18.0","branchLabelMappingKey":"^v8.18.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Arturo Lidueña <[email protected]>
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this issue Dec 9, 2024
… (elastic#199286)

## Summary

Closes elastic#183245

Closes elastic#197630 
[Observability AI Assistant] Partially migrate to inference client 

replacing `inferenceClient.chatComplete` to
`observabilityAIAssistantClient.chat` -
`observabilityAIAssistantClient.complete` does a bunch of stuff on top
of `chat`. keepping `observabilityAIAssistantClient.chat` as a wrapper
for now because it also adds instrumentation and logging.
Samiul-TheSoccerFan pushed a commit to Samiul-TheSoccerFan/kibana that referenced this issue Dec 10, 2024
… (elastic#199286)

## Summary

Closes elastic#183245

Closes elastic#197630 
[Observability AI Assistant] Partially migrate to inference client 

replacing `inferenceClient.chatComplete` to
`observabilityAIAssistantClient.chat` -
`observabilityAIAssistantClient.complete` does a bunch of stuff on top
of `chat`. keepping `observabilityAIAssistantClient.chat` as a wrapper
for now because it also adds instrumentation and logging.
mykolaharmash pushed a commit to mykolaharmash/kibana that referenced this issue Dec 11, 2024
… (elastic#199286)

## Summary

Closes elastic#183245

Closes elastic#197630 
[Observability AI Assistant] Partially migrate to inference client 

replacing `inferenceClient.chatComplete` to
`observabilityAIAssistantClient.chat` -
`observabilityAIAssistantClient.complete` does a bunch of stuff on top
of `chat`. keepping `observabilityAIAssistantClient.chat` as a wrapper
for now because it also adds instrumentation and logging.
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this issue Dec 12, 2024
… (elastic#199286)

## Summary

Closes elastic#183245

Closes elastic#197630 
[Observability AI Assistant] Partially migrate to inference client 

replacing `inferenceClient.chatComplete` to
`observabilityAIAssistantClient.chat` -
`observabilityAIAssistantClient.complete` does a bunch of stuff on top
of `chat`. keepping `observabilityAIAssistantClient.chat` as a wrapper
for now because it also adds instrumentation and logging.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Obs AI Assistant Observability AI Assistant
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants