Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Obs AI Assistant] Instructions & Claude improvements #181058

Merged

Conversation

dgieselaar
Copy link
Member

@dgieselaar dgieselaar commented Apr 17, 2024

When we send over a conversation to the LLM for completion, we include a system message. System messages are a way for the consumer (in this case, us as developers) to control the LLM's behavior.

This system message was previously constructed by using a concept called ContextDefinition - originally this was a way to define a set of functions and behavior for a specific context, e.g. core functionality, APM-specific functionality, platform-specific functionality etc. However we never actually did anything with this, and much of its intended functionality is now captured with the screen context API.

In #179736, we added user instructions, which are ways for the user to control the Assistant's behaviour, by appending to the system message we construct with the registered context definitions.

With this PR, we are making several changes:

  • Remove the concept of concept definitions entirely
  • Replace it with registerInstruction, which allows the consumer to register pieces of text that will be included in the system message.
  • registerInstruction also takes a callback. That callback receives the available function names for that specific chat request. For instance, when we reach the function call limit, the LLM will have no functions to call. This allows consumers to cater their instructions to this specific scenario, which somewhat limits the possibility of the LLM calling a function that it is not allowed to - Claude is especially prone to this (likely related to the fact we use simulated function calling).

This leads to the following functional changes:

  • A system message is now constructed by combining the registered instructions (system-specific) with the knowledge base and request instructions (user-specific)
  • GET /internal/observability_ai_assistant/functions no longer returns the contexts. Instead it returns the system message
  • GET /internal/observability_ai_assistant/chat/complete now creates a system message at the start, and overrides the system message from the request.
  • For each invocation of chat, it re-calculates the system message by "materializing" the registered instructions with the available function names for that chat invocation

Additionally, I've made some attempted improvements to simulated function calling:

  • simplified the system message
  • more emphasis on generating valid JSON (e.g. I saw multiline delimiters """ which are not supported)
  • more emphasis on not providing any input if the function does not accept any parameters. e.g. Claude was trying to provide entire search requests or SPL-like query strings as input, which led to hallucinations)

There are also some other changes, which I've commented on in the file changes.

Addendum: I have pushed some more changes, related to the evaluation framework (and running it with Claude). Will comment inline in 9ebd207 (#181058).

@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

concatenatedMessage: ConcatenatedMessage
) => Promise<ConcatenatedMessage>;

function mergeWithEditedMessage(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated change, I'm not sure if mergeMap is supposed to take an async callback, and what happens when the promise is rejected, so I've opted to make it more explicit by using RxJS's from, that converts a promise (amongst other things) into an Observable.

let esqlQuery = correctCommonEsqlMistakes(msg.message.content, resources.logger).match(
/```esql([\s\S]*?)```/
)?.[1];
esqlQuery = await correctQueryWithActions(esqlQuery ?? '');
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have temporarily disabled this as I saw some incorrect fixes being applied which led to syntax errors. E.g.:

   | WHERE @timestamp >= NOW() - 15 minutes

became:

   | WHERE `@timestamp >= NOW(`) - 15 minutes

@@ -127,6 +130,14 @@ export function MessageText({ loading, content, onActionClick }: Props) {
processingPlugins[1][1].components = {
...components,
cursor: Cursor,
codeBlock: (props) => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this ensures we can also render things like kql and json

);
}

let storedSystemMessage: string = ''; // will be set as soon as kb instructions are loaded
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super happy about this, but let's tackle it in #180633.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is confusing and feels error prone

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open a PR after this that fixes it d005850 (#181255)

@dgieselaar dgieselaar added release_note:skip Skip the PR/issue when compiling release notes v8.14.0 labels Apr 18, 2024
@dgieselaar dgieselaar marked this pull request as ready for review April 18, 2024 07:35
@dgieselaar dgieselaar requested review from a team as code owners April 18, 2024 07:35
Copy link
Contributor

@cauemarcondes cauemarcondes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APM changes LGTM

@botelastic botelastic bot added Team:Obs AI Assistant Observability AI Assistant Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team labels Apr 18, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

@dgieselaar
Copy link
Member Author

/ci

source$: Observable<StreamingChatResponseEvent>
): Observable<Exclude<T, ChatCompletionErrorEvent>> => {
return source$.pipe(
export function throwSerializedChatCompletionErrors<
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplifying the type

@@ -38,7 +38,8 @@ export function registerGetApmDownstreamDependenciesFunction({
},
'service.environment': {
type: 'string',
description: 'The environment that the service is running in',
description:
'The environment that the service is running in. Leave empty to query for all environments.',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was hallucinating * here.

.map((line) => JSON.parse(line) as T | BufferFlushEvent)
),
throwSerializedChatCompletionErrors(),
retry({
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry, but only when it's an internal error or an Axios error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have examples of the errors returned here ? Wondering if 1 retry is enough

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

429s for instance, or an internal server error (ie, something that is on us to fix). I think one retry is good enough in most cases, I also don't want it to retry too often on a 429, and implementing an incremental backoff mechanism is probably not worth it at this point, it will just make the test take a very long time.

),
that.log.info('Chat', name);

const chat$ = defer(() => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving this into defer() allows it to be retryable

@@ -397,7 +446,9 @@ export class KibanaClient {

This is the conversation:

${JSON.stringify(messages)}`,
${JSON.stringify(
messages.map((msg) => pick(msg, 'content', 'name', 'function_call', 'role'))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure we don't send over data etc.

@@ -46,7 +46,7 @@ export function registerElasticsearchFunction({
body,
});

return { content: response };
return { content: { response } };
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes transport.request returns plaintext, so we need to wrap it in an object. otherwise we'll have an error somewhere else where we expect it to be an object.


Additionally, you can use the "context" function to retrieve relevant information from the knowledge database.`);
if (availableFunctionNames.includes('summarize')) {
instructions.push(`You can use the "summarize" functions to store new information you have learned in a knowledge database.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toned this down - Claude is too eager to do this.

Comment on lines +46 to +48
const match = buffer.match(
/<\|tool_use_start\|>\s*```json\n?(.*?)(\n```\s*).*<\|tool_use_end\|>/s
);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly more relaxed parsing. sometimes Claude does things like:

<|tool_use_start|>
```json
{ "name": "query", "input": {} }
```
```esql
SELECT FROM foo
```
<|tool_use_end|>

import { isFunctionNotFoundError } from '../../../common/conversation_complete';
import { emitWithConcatenatedMessage } from '../../../common/utils/emit_with_concatenated_message';

export function catchFunctionLimitExceededError(): OperatorFunction<
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes the function call from the message if the function limit has been exceeded and the LLM tries to call a function anyway. It's a last resort.

@@ -108,5 +108,24 @@ describe('correctCommonEsqlMistakes', () => {
| WHERE statement LIKE "SELECT%"
| STATS avg_duration = AVG(duration)`
);

expectQuery(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure it works with multiline commands.


Even if the "context" function was used before that, follow it up with the "query" function. If a query fails, do not attempt to correct it yourself. Again you should call the "query" function,
even if it has been called before.
return client.getKnowledgeBaseStatus().then((response) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
return client.getKnowledgeBaseStatus().then((response) => {
const { ready: isReady } = client.getKnowledgeBaseStatus();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, fixed

Comment on lines +21 to +32
const getCodeBlockClassName = (theme: UseEuiTheme) => css`
background-color: ${theme.euiTheme.colors.lightestShade};
.euiCodeBlock__pre {
margin-bottom: 0;
padding: ${theme.euiTheme.size.m};
min-block-size: 48px;
}
.euiCodeBlock__controls {
inset-block-start: ${theme.euiTheme.size.m};
inset-inline-end: ${theme.euiTheme.size.m};
}
`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always get suspicious when I see custom css. Is this an oversight from EUI or is our use case different?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's this change

image

image

@@ -125,7 +125,7 @@ function runEvaluations() {

const mocha = new Mocha({
grep: argv.grep,
timeout: '5m',
timeout: '10m',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things taking longer now ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, due to retries

@@ -173,7 +173,9 @@ describe('processBedrockStream', () => {
);
}

await expect(fn).rejects.toThrowErrorMatchingInlineSnapshot(`"no elements in sequence"`);
await expect(fn).rejects.toThrowErrorMatchingInlineSnapshot(
`"Unexpected token 'i', \\"invalid json\\" is not valid JSON"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the right error message. Was it incorrect before or what changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was incorrect before - outcome was the same (it failed), but the error was not propagated, leading to a different RxJS error

prompt: 100,
total: 102,
prompt: 156,
total: 158,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that we actually test the token count but I can see it being slightly annoying having to manually update

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah honestly I don't mind, because I noticed a bug 😄 The fix for that is in a follow-up PR

);
}

let storedSystemMessage: string = ''; // will be set as soon as kb instructions are loaded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is confusing and feels error prone

@dgieselaar dgieselaar enabled auto-merge (squash) April 23, 2024 17:45
@dgieselaar
Copy link
Member Author

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
observabilityAIAssistant 89 88 -1

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
observabilityAIAssistant 251 247 -4

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
observabilityAIAssistant 16.9KB 16.9KB -25.0B

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
observabilityAIAssistant 25 24 -1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
observabilityAIAssistant 45.6KB 45.8KB +149.0B
Unknown metric groups

API count

id before after diff
observabilityAIAssistant 253 249 -4

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@dgieselaar dgieselaar merged commit ba76b50 into elastic:main Apr 24, 2024
20 checks passed
@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.14 Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 181058

Questions ?

Please refer to the Backport tool documentation

@dgieselaar
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.14

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

dgieselaar added a commit to dgieselaar/kibana that referenced this pull request Apr 30, 2024
When we send over a conversation to the LLM for completion, we include a
system message. System messages are a way for the consumer (in this
case, us as developers) to control the LLM's behavior.

This system message was previously constructed by using a concept called
`ContextDefinition` - originally this was a way to define a set of
functions and behavior for a specific context, e.g. core functionality,
APM-specific functionality, platform-specific functionality etc. However
we never actually did anything with this, and much of its intended
functionality is now captured with the screen context API.

In elastic#179736, we added user
instructions, which are ways for the user to control the Assistant's
behaviour, by appending to the system message we construct with the
registered context definitions.

With this PR, we are making several changes:

- Remove the concept of concept definitions entirely
- Replace it with `registerInstruction`, which allows the consumer to
register pieces of text that will be included in the system message.
- `registerInstruction` _also_ takes a callback. That callback receives
the available function names for that specific chat request. For
instance, when we reach the function call limit, the LLM will have no
functions to call. This allows consumers to cater their instructions to
this specific scenario, which somewhat limits the possibility of the LLM
calling a function that it is not allowed to - Claude is especially
prone to this (likely related to the fact we use simulated function
calling).

This leads to the following functional changes:
- A system message is now constructed by combining the registered
instructions (system-specific) with the knowledge base and request
instructions (user-specific)
- `GET /internal/observability_ai_assistant/functions` no longer returns
the contexts. Instead it returns the system message
- `GET /internal/observability_ai_assistant/chat/complete` now creates a
system message at the start, and overrides the system message from the
request.
- For each invocation of `chat`, it re-calculates the system message by
"materializing" the registered instructions with the available function
names for that chat invocation

Additionally, I've made some attempted improvements to simulated
function calling:
- simplified the system message
- more emphasis on generating valid JSON (e.g. I saw multiline
delimiters `"""` which are not supported)
- more emphasis on not providing any input if the function does not
accept any parameters. e.g. Claude was trying to provide entire search
requests or SPL-like query strings as input, which led to
hallucinations)

There are also some other changes, which I've commented on in the file
changes.

**Addendum: I have pushed some more changes, related to the evaluation
framework (and running it with Claude). Will comment inline in
[`9ebd207`
(elastic#181058)](https://github.com/elastic/kibana/pull/181058/commits/9ebd207acd47c33077627356c464958240c9d446).**

(cherry picked from commit ba76b50)
dgieselaar added a commit that referenced this pull request May 1, 2024
#182149)

# Backport

This will backport the following commits from `main` to `8.14`:
- [[Obs AI Assistant] Instructions & Claude improvements
(#181058)](#181058)

<!--- Backport version: 7.3.2 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT {commits} BACKPORT-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apm:review release_note:skip Skip the PR/issue when compiling release notes Team:Obs AI Assistant Observability AI Assistant Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team v8.14.0 v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants