-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Obs AI Assistant] Instructions & Claude improvements #181058
[Obs AI Assistant] Instructions & Claude improvements #181058
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
concatenatedMessage: ConcatenatedMessage | ||
) => Promise<ConcatenatedMessage>; | ||
|
||
function mergeWithEditedMessage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated change, I'm not sure if mergeMap is supposed to take an async callback, and what happens when the promise is rejected, so I've opted to make it more explicit by using RxJS's from
, that converts a promise (amongst other things) into an Observable.
let esqlQuery = correctCommonEsqlMistakes(msg.message.content, resources.logger).match( | ||
/```esql([\s\S]*?)```/ | ||
)?.[1]; | ||
esqlQuery = await correctQueryWithActions(esqlQuery ?? ''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have temporarily disabled this as I saw some incorrect fixes being applied which led to syntax errors. E.g.:
| WHERE @timestamp >= NOW() - 15 minutes
became:
| WHERE `@timestamp >= NOW(`) - 15 minutes
@@ -127,6 +130,14 @@ export function MessageText({ loading, content, onActionClick }: Props) { | |||
processingPlugins[1][1].components = { | |||
...components, | |||
cursor: Cursor, | |||
codeBlock: (props) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this ensures we can also render things like kql and
json
); | ||
} | ||
|
||
let storedSystemMessage: string = ''; // will be set as soon as kb instructions are loaded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super happy about this, but let's tackle it in #180633.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is confusing and feels error prone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will open a PR after this that fixes it d005850
(#181255)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APM changes LGTM
Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services) |
/ci |
…t-suggest-next-steps
…t-suggest-next-steps
source$: Observable<StreamingChatResponseEvent> | ||
): Observable<Exclude<T, ChatCompletionErrorEvent>> => { | ||
return source$.pipe( | ||
export function throwSerializedChatCompletionErrors< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplifying the type
@@ -38,7 +38,8 @@ export function registerGetApmDownstreamDependenciesFunction({ | |||
}, | |||
'service.environment': { | |||
type: 'string', | |||
description: 'The environment that the service is running in', | |||
description: | |||
'The environment that the service is running in. Leave empty to query for all environments.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was hallucinating *
here.
.map((line) => JSON.parse(line) as T | BufferFlushEvent) | ||
), | ||
throwSerializedChatCompletionErrors(), | ||
retry({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retry, but only when it's an internal error or an Axios error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have examples of the errors returned here ? Wondering if 1 retry is enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
429s for instance, or an internal server error (ie, something that is on us to fix). I think one retry is good enough in most cases, I also don't want it to retry too often on a 429, and implementing an incremental backoff mechanism is probably not worth it at this point, it will just make the test take a very long time.
), | ||
that.log.info('Chat', name); | ||
|
||
const chat$ = defer(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moving this into defer()
allows it to be retryable
@@ -397,7 +446,9 @@ export class KibanaClient { | |||
|
|||
This is the conversation: | |||
|
|||
${JSON.stringify(messages)}`, | |||
${JSON.stringify( | |||
messages.map((msg) => pick(msg, 'content', 'name', 'function_call', 'role')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure we don't send over data
etc.
@@ -46,7 +46,7 @@ export function registerElasticsearchFunction({ | |||
body, | |||
}); | |||
|
|||
return { content: response }; | |||
return { content: { response } }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sometimes transport.request
returns plaintext, so we need to wrap it in an object. otherwise we'll have an error somewhere else where we expect it to be an object.
|
||
Additionally, you can use the "context" function to retrieve relevant information from the knowledge database.`); | ||
if (availableFunctionNames.includes('summarize')) { | ||
instructions.push(`You can use the "summarize" functions to store new information you have learned in a knowledge database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toned this down - Claude is too eager to do this.
const match = buffer.match( | ||
/<\|tool_use_start\|>\s*```json\n?(.*?)(\n```\s*).*<\|tool_use_end\|>/s | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slightly more relaxed parsing. sometimes Claude does things like:
<|tool_use_start|>
```json
{ "name": "query", "input": {} }
```
```esql
SELECT FROM foo
```
<|tool_use_end|>
import { isFunctionNotFoundError } from '../../../common/conversation_complete'; | ||
import { emitWithConcatenatedMessage } from '../../../common/utils/emit_with_concatenated_message'; | ||
|
||
export function catchFunctionLimitExceededError(): OperatorFunction< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removes the function call from the message if the function limit has been exceeded and the LLM tries to call a function anyway. It's a last resort.
@@ -108,5 +108,24 @@ describe('correctCommonEsqlMistakes', () => { | |||
| WHERE statement LIKE "SELECT%" | |||
| STATS avg_duration = AVG(duration)` | |||
); | |||
|
|||
expectQuery( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure it works with multiline commands.
|
||
Even if the "context" function was used before that, follow it up with the "query" function. If a query fails, do not attempt to correct it yourself. Again you should call the "query" function, | ||
even if it has been called before. | ||
return client.getKnowledgeBaseStatus().then((response) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
return client.getKnowledgeBaseStatus().then((response) => { | |
const { ready: isReady } = client.getKnowledgeBaseStatus(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, fixed
const getCodeBlockClassName = (theme: UseEuiTheme) => css` | ||
background-color: ${theme.euiTheme.colors.lightestShade}; | ||
.euiCodeBlock__pre { | ||
margin-bottom: 0; | ||
padding: ${theme.euiTheme.size.m}; | ||
min-block-size: 48px; | ||
} | ||
.euiCodeBlock__controls { | ||
inset-block-start: ${theme.euiTheme.size.m}; | ||
inset-inline-end: ${theme.euiTheme.size.m}; | ||
} | ||
`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always get suspicious when I see custom css. Is this an oversight from EUI or is our use case different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -125,7 +125,7 @@ function runEvaluations() { | |||
|
|||
const mocha = new Mocha({ | |||
grep: argv.grep, | |||
timeout: '5m', | |||
timeout: '10m', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things taking longer now ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, due to retries
@@ -173,7 +173,9 @@ describe('processBedrockStream', () => { | |||
); | |||
} | |||
|
|||
await expect(fn).rejects.toThrowErrorMatchingInlineSnapshot(`"no elements in sequence"`); | |||
await expect(fn).rejects.toThrowErrorMatchingInlineSnapshot( | |||
`"Unexpected token 'i', \\"invalid json\\" is not valid JSON"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the right error message. Was it incorrect before or what changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was incorrect before - outcome was the same (it failed), but the error was not propagated, leading to a different RxJS error
prompt: 100, | ||
total: 102, | ||
prompt: 156, | ||
total: 158, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that we actually test the token count but I can see it being slightly annoying having to manually update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah honestly I don't mind, because I noticed a bug 😄 The fix for that is in a follow-up PR
); | ||
} | ||
|
||
let storedSystemMessage: string = ''; // will be set as soon as kb instructions are loaded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is confusing and feels error prone
@elasticmachine merge upstream |
💚 Build Succeeded
Metrics [docs]Module Count
Public APIs missing comments
Async chunks
Public APIs missing exports
Page load bundle
History
To update your PR or re-run it, just comment with: |
💔 All backports failed
Manual backportTo create the backport manually run:
Questions ?Please refer to the Backport tool documentation |
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
When we send over a conversation to the LLM for completion, we include a system message. System messages are a way for the consumer (in this case, us as developers) to control the LLM's behavior. This system message was previously constructed by using a concept called `ContextDefinition` - originally this was a way to define a set of functions and behavior for a specific context, e.g. core functionality, APM-specific functionality, platform-specific functionality etc. However we never actually did anything with this, and much of its intended functionality is now captured with the screen context API. In elastic#179736, we added user instructions, which are ways for the user to control the Assistant's behaviour, by appending to the system message we construct with the registered context definitions. With this PR, we are making several changes: - Remove the concept of concept definitions entirely - Replace it with `registerInstruction`, which allows the consumer to register pieces of text that will be included in the system message. - `registerInstruction` _also_ takes a callback. That callback receives the available function names for that specific chat request. For instance, when we reach the function call limit, the LLM will have no functions to call. This allows consumers to cater their instructions to this specific scenario, which somewhat limits the possibility of the LLM calling a function that it is not allowed to - Claude is especially prone to this (likely related to the fact we use simulated function calling). This leads to the following functional changes: - A system message is now constructed by combining the registered instructions (system-specific) with the knowledge base and request instructions (user-specific) - `GET /internal/observability_ai_assistant/functions` no longer returns the contexts. Instead it returns the system message - `GET /internal/observability_ai_assistant/chat/complete` now creates a system message at the start, and overrides the system message from the request. - For each invocation of `chat`, it re-calculates the system message by "materializing" the registered instructions with the available function names for that chat invocation Additionally, I've made some attempted improvements to simulated function calling: - simplified the system message - more emphasis on generating valid JSON (e.g. I saw multiline delimiters `"""` which are not supported) - more emphasis on not providing any input if the function does not accept any parameters. e.g. Claude was trying to provide entire search requests or SPL-like query strings as input, which led to hallucinations) There are also some other changes, which I've commented on in the file changes. **Addendum: I have pushed some more changes, related to the evaluation framework (and running it with Claude). Will comment inline in [`9ebd207` (elastic#181058)](https://github.com/elastic/kibana/pull/181058/commits/9ebd207acd47c33077627356c464958240c9d446).** (cherry picked from commit ba76b50)
#182149) # Backport This will backport the following commits from `main` to `8.14`: - [[Obs AI Assistant] Instructions & Claude improvements (#181058)](#181058) <!--- Backport version: 7.3.2 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT {commits} BACKPORT-->
When we send over a conversation to the LLM for completion, we include a system message. System messages are a way for the consumer (in this case, us as developers) to control the LLM's behavior.
This system message was previously constructed by using a concept called
ContextDefinition
- originally this was a way to define a set of functions and behavior for a specific context, e.g. core functionality, APM-specific functionality, platform-specific functionality etc. However we never actually did anything with this, and much of its intended functionality is now captured with the screen context API.In #179736, we added user instructions, which are ways for the user to control the Assistant's behaviour, by appending to the system message we construct with the registered context definitions.
With this PR, we are making several changes:
registerInstruction
, which allows the consumer to register pieces of text that will be included in the system message.registerInstruction
also takes a callback. That callback receives the available function names for that specific chat request. For instance, when we reach the function call limit, the LLM will have no functions to call. This allows consumers to cater their instructions to this specific scenario, which somewhat limits the possibility of the LLM calling a function that it is not allowed to - Claude is especially prone to this (likely related to the fact we use simulated function calling).This leads to the following functional changes:
GET /internal/observability_ai_assistant/functions
no longer returns the contexts. Instead it returns the system messageGET /internal/observability_ai_assistant/chat/complete
now creates a system message at the start, and overrides the system message from the request.chat
, it re-calculates the system message by "materializing" the registered instructions with the available function names for that chat invocationAdditionally, I've made some attempted improvements to simulated function calling:
"""
which are not supported)There are also some other changes, which I've commented on in the file changes.
Addendum: I have pushed some more changes, related to the evaluation framework (and running it with Claude). Will comment inline in
9ebd207
(#181058).