[Obs AI Assistant] Instructions & Claude improvements #181058

dgieselaar · 2024-04-17T14:51:52Z

When we send over a conversation to the LLM for completion, we include a system message. System messages are a way for the consumer (in this case, us as developers) to control the LLM's behavior.

This system message was previously constructed by using a concept called ContextDefinition - originally this was a way to define a set of functions and behavior for a specific context, e.g. core functionality, APM-specific functionality, platform-specific functionality etc. However we never actually did anything with this, and much of its intended functionality is now captured with the screen context API.

In #179736, we added user instructions, which are ways for the user to control the Assistant's behaviour, by appending to the system message we construct with the registered context definitions.

With this PR, we are making several changes:

Remove the concept of concept definitions entirely
Replace it with registerInstruction, which allows the consumer to register pieces of text that will be included in the system message.
registerInstruction also takes a callback. That callback receives the available function names for that specific chat request. For instance, when we reach the function call limit, the LLM will have no functions to call. This allows consumers to cater their instructions to this specific scenario, which somewhat limits the possibility of the LLM calling a function that it is not allowed to - Claude is especially prone to this (likely related to the fact we use simulated function calling).

This leads to the following functional changes:

A system message is now constructed by combining the registered instructions (system-specific) with the knowledge base and request instructions (user-specific)
GET /internal/observability_ai_assistant/functions no longer returns the contexts. Instead it returns the system message
GET /internal/observability_ai_assistant/chat/complete now creates a system message at the start, and overrides the system message from the request.
For each invocation of chat, it re-calculates the system message by "materializing" the registered instructions with the available function names for that chat invocation

Additionally, I've made some attempted improvements to simulated function calling:

simplified the system message
more emphasis on generating valid JSON (e.g. I saw multiline delimiters """ which are not supported)
more emphasis on not providing any input if the function does not accept any parameters. e.g. Claude was trying to provide entire search requests or SPL-like query strings as input, which led to hallucinations)

There are also some other changes, which I've commented on in the file changes.

Addendum: I have pushed some more changes, related to the evaluation framework (and running it with Claude). Will comment inline in 9ebd207 (#181058).

…t-suggest-next-steps

apmmachine · 2024-04-17T14:52:08Z

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

/oblt-deploy : Deploy a Kibana instance using the Observability test environments.
/oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

dgieselaar · 2024-04-18T06:47:53Z

...rvability_solution/observability_ai_assistant/common/utils/emit_with_concatenated_message.ts

+  concatenatedMessage: ConcatenatedMessage
+) => Promise<ConcatenatedMessage>;
+
+function mergeWithEditedMessage(


unrelated change, I'm not sure if mergeMap is supposed to take an async callback, and what happens when the promise is rejected, so I've opted to make it more explicit by using RxJS's from, that converts a promise (amongst other things) into an Observable.

dgieselaar · 2024-04-18T07:09:05Z

...lugins/observability_solution/observability_ai_assistant_app/server/functions/query/index.ts

-          let esqlQuery = correctCommonEsqlMistakes(msg.message.content, resources.logger).match(
-            /```esql([\s\S]*?)```/
-          )?.[1];
-          esqlQuery = await correctQueryWithActions(esqlQuery ?? '');


I have temporarily disabled this as I saw some incorrect fixes being applied which led to syntax errors. E.g.:

| WHERE @timestamp >= NOW() - 15 minutes

became:

| WHERE `@timestamp >= NOW(`) - 15 minutes

…t-suggest-next-steps

dgieselaar · 2024-04-18T07:18:52Z

...ability_solution/observability_ai_assistant/public/components/message_panel/message_text.tsx

@@ -127,6 +130,14 @@ export function MessageText({ loading, content, onActionClick }: Props) {
    processingPlugins[1][1].components = {
      ...components,
      cursor: Cursor,
+      codeBlock: (props) => {


this ensures we can also render things like kql and json

dgieselaar · 2024-04-18T07:20:20Z

x-pack/plugins/observability_solution/observability_ai_assistant/server/service/client/index.ts

+          );
+        }
+
+        let storedSystemMessage: string = ''; // will be set as soon as kb instructions are loaded


I'm not super happy about this, but let's tackle it in #180633.

Yeah, this is confusing and feels error prone

I will open a PR after this that fixes it d005850 (#181255)

cauemarcondes

APM changes LGTM

elasticmachine · 2024-04-18T13:58:56Z

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

dgieselaar · 2024-04-20T05:45:01Z

/ci

…t-suggest-next-steps

dgieselaar · 2024-04-22T16:22:42Z

..._solution/observability_ai_assistant/common/utils/throw_serialized_chat_completion_errors.ts

-    source$: Observable<StreamingChatResponseEvent>
-  ): Observable<Exclude<T, ChatCompletionErrorEvent>> => {
-    return source$.pipe(
+export function throwSerializedChatCompletionErrors<


Simplifying the type

dgieselaar · 2024-04-22T16:23:25Z

...ins/observability_solution/apm/server/assistant_functions/get_apm_downstream_dependencies.ts

@@ -38,7 +38,8 @@ export function registerGetApmDownstreamDependenciesFunction({
          },
          'service.environment': {
            type: 'string',
-            description: 'The environment that the service is running in',
+            description:
+              'The environment that the service is running in. Leave empty to query for all environments.',


it was hallucinating * here.

dgieselaar · 2024-04-22T16:23:50Z

...lugins/observability_solution/observability_ai_assistant/scripts/evaluation/kibana_client.ts

+              .map((line) => JSON.parse(line) as T | BufferFlushEvent)
+          ),
+          throwSerializedChatCompletionErrors(),
+          retry({


Retry, but only when it's an internal error or an Axios error.

Do we have examples of the errors returned here ? Wondering if 1 retry is enough

429s for instance, or an internal server error (ie, something that is on us to fix). I think one retry is good enough in most cases, I also don't want it to retry too often on a 429, and implementing an incremental backoff mechanism is probably not worth it at this point, it will just make the test take a very long time.

dgieselaar · 2024-04-22T16:24:07Z

...lugins/observability_solution/observability_ai_assistant/scripts/evaluation/kibana_client.ts

-        ),
+      that.log.info('Chat', name);
+
+      const chat$ = defer(() => {


moving this into defer() allows it to be retryable

dgieselaar · 2024-04-22T16:24:20Z

...lugins/observability_solution/observability_ai_assistant/scripts/evaluation/kibana_client.ts

@@ -397,7 +446,9 @@ export class KibanaClient {

                This is the conversation:

-                ${JSON.stringify(messages)}`,
+                ${JSON.stringify(
+                  messages.map((msg) => pick(msg, 'content', 'name', 'function_call', 'role'))


make sure we don't send over data etc.

dgieselaar · 2024-04-22T16:25:04Z

.../plugins/observability_solution/observability_ai_assistant/server/functions/elasticsearch.ts

@@ -46,7 +46,7 @@ export function registerElasticsearchFunction({
        body,
      });

-      return { content: response };
+      return { content: { response } };


sometimes transport.request returns plaintext, so we need to wrap it in an object. otherwise we'll have an error somewhere else where we expect it to be an object.

dgieselaar · 2024-04-22T16:25:20Z

x-pack/plugins/observability_solution/observability_ai_assistant/server/functions/index.ts

-
-        Additionally, you can use the "context" function to retrieve relevant information from the knowledge database.`);
+        if (availableFunctionNames.includes('summarize')) {
+          instructions.push(`You can use the "summarize" functions to store new information you have learned in a knowledge database.


toned this down - Claude is too eager to do this.

dgieselaar · 2024-04-22T16:28:16Z

...tant/server/service/client/adapters/simulate_function_calling/parse_inline_function_calls.ts

+        const match = buffer.match(
+          /<\|tool_use_start\|>\s*```json\n?(.*?)(\n```\s*).*<\|tool_use_end\|>/s
+        );


slightly more relaxed parsing. sometimes Claude does things like:

<|tool_use_start|> ```json { "name": "query", "input": {} } ``` ```esql SELECT FROM foo ``` <|tool_use_end|>

dgieselaar · 2024-04-22T16:28:58Z

...lution/observability_ai_assistant/server/service/util/catch_function_limit_exceeded_error.ts

+import { isFunctionNotFoundError } from '../../../common/conversation_complete';
+import { emitWithConcatenatedMessage } from '../../../common/utils/emit_with_concatenated_message';
+
+export function catchFunctionLimitExceededError(): OperatorFunction<


This removes the function call from the message if the function limit has been exceeded and the LLM tries to call a function anyway. It's a last resort.

dgieselaar · 2024-04-22T16:29:13Z

...n/observability_ai_assistant_app/server/functions/query/correct_common_esql_mistakes.test.ts

@@ -108,5 +108,24 @@ describe('correctCommonEsqlMistakes', () => {
    | WHERE statement LIKE "SELECT%"
    | STATS avg_duration = AVG(duration)`
    );
+
+    expectQuery(


make sure it works with multiline commands.

…t-suggest-next-steps

sorenlouv · 2024-04-23T11:21:22Z

x-pack/plugins/observability_solution/observability_ai_assistant/server/functions/index.ts

-
-        Even if the "context" function was used before that, follow it up with the "query" function. If a query fails, do not attempt to correct it yourself. Again you should call the "query" function,
-        even if it has been called before.
+  return client.getKnowledgeBaseStatus().then((response) => {


nit

Suggested change

return client.getKnowledgeBaseStatus().then((response) => {

const { ready: isReady } = client.getKnowledgeBaseStatus();

Agreed, fixed

sorenlouv · 2024-04-23T11:32:35Z

...lity_solution/observability_ai_assistant/public/components/message_panel/esql_code_block.tsx

+const getCodeBlockClassName = (theme: UseEuiTheme) => css`
+  background-color: ${theme.euiTheme.colors.lightestShade};
+  .euiCodeBlock__pre {
+    margin-bottom: 0;
+    padding: ${theme.euiTheme.size.m};
+    min-block-size: 48px;
+  }
+  .euiCodeBlock__controls {
+    inset-block-start: ${theme.euiTheme.size.m};
+    inset-inline-end: ${theme.euiTheme.size.m};
+  }
+`;


I always get suspicious when I see custom css. Is this an oversight from EUI or is our use case different?

Ah, it's this change

sorenlouv · 2024-04-23T12:22:00Z

...k/plugins/observability_solution/observability_ai_assistant/scripts/evaluation/evaluation.ts

@@ -125,7 +125,7 @@ function runEvaluations() {

          const mocha = new Mocha({
            grep: argv.grep,
-            timeout: '5m',
+            timeout: '10m',


Things taking longer now ?

yes, due to retries

sorenlouv · 2024-04-23T12:26:42Z

...rvability_ai_assistant/server/service/client/adapters/bedrock/process_bedrock_stream.test.ts

@@ -173,7 +173,9 @@ describe('processBedrockStream', () => {
      );
    }

-    await expect(fn).rejects.toThrowErrorMatchingInlineSnapshot(`"no elements in sequence"`);
+    await expect(fn).rejects.toThrowErrorMatchingInlineSnapshot(
+      `"Unexpected token 'i', \\"invalid json\\" is not valid JSON"`


This looks like the right error message. Was it incorrect before or what changed?

it was incorrect before - outcome was the same (it failed), but the error was not propagated, leading to a different RxJS error

sorenlouv · 2024-04-23T12:28:02Z

...lugins/observability_solution/observability_ai_assistant/server/service/client/index.test.ts

-              prompt: 100,
-              total: 102,
+              prompt: 156,
+              total: 158,


I like that we actually test the token count but I can see it being slightly annoying having to manually update

Yeah honestly I don't mind, because I noticed a bug 😄 The fix for that is in a follow-up PR

sorenlouv · 2024-04-23T12:32:29Z

x-pack/plugins/observability_solution/observability_ai_assistant/server/service/client/index.ts

+          );
+        }
+
+        let storedSystemMessage: string = ''; // will be set as soon as kb instructions are loaded


Yeah, this is confusing and feels error prone

dgieselaar · 2024-04-24T08:09:50Z

@elasticmachine merge upstream

kibana-ci · 2024-04-24T09:28:59Z

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`observabilityAIAssistant`	89	88	-1

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`observabilityAIAssistant`	251	247	-4

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`observabilityAIAssistant`	16.9KB	16.9KB	-25.0B

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id	before	after	diff
`observabilityAIAssistant`	25	24	-1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`observabilityAIAssistant`	45.6KB	45.8KB	+149.0B

Unknown metric groups

API count

id	before	after	diff
`observabilityAIAssistant`	253	249	-4

History

💛 Build #205401 was flaky 847a248
💚 Build #205168 succeeded 8040437
💔 Build #205044 failed 9ebd207
💚 Build #204835 succeeded 02bba6d
💔 Build #204824 failed 0849046
💔 Build #204802 failed 4744879

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

kibanamachine · 2024-04-24T09:33:31Z

💔 All backports failed

Status	Branch	Result
❌	8.14	Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 181058

Questions ?

Please refer to the Backport tool documentation

dgieselaar · 2024-04-30T14:22:28Z

💚 All backports created successfully

Status	Branch	Result
✅	8.14

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

When we send over a conversation to the LLM for completion, we include a system message. System messages are a way for the consumer (in this case, us as developers) to control the LLM's behavior. This system message was previously constructed by using a concept called `ContextDefinition` - originally this was a way to define a set of functions and behavior for a specific context, e.g. core functionality, APM-specific functionality, platform-specific functionality etc. However we never actually did anything with this, and much of its intended functionality is now captured with the screen context API. In elastic#179736, we added user instructions, which are ways for the user to control the Assistant's behaviour, by appending to the system message we construct with the registered context definitions. With this PR, we are making several changes: - Remove the concept of concept definitions entirely - Replace it with `registerInstruction`, which allows the consumer to register pieces of text that will be included in the system message. - `registerInstruction` _also_ takes a callback. That callback receives the available function names for that specific chat request. For instance, when we reach the function call limit, the LLM will have no functions to call. This allows consumers to cater their instructions to this specific scenario, which somewhat limits the possibility of the LLM calling a function that it is not allowed to - Claude is especially prone to this (likely related to the fact we use simulated function calling). This leads to the following functional changes: - A system message is now constructed by combining the registered instructions (system-specific) with the knowledge base and request instructions (user-specific) - `GET /internal/observability_ai_assistant/functions` no longer returns the contexts. Instead it returns the system message - `GET /internal/observability_ai_assistant/chat/complete` now creates a system message at the start, and overrides the system message from the request. - For each invocation of `chat`, it re-calculates the system message by "materializing" the registered instructions with the available function names for that chat invocation Additionally, I've made some attempted improvements to simulated function calling: - simplified the system message - more emphasis on generating valid JSON (e.g. I saw multiline delimiters `"""` which are not supported) - more emphasis on not providing any input if the function does not accept any parameters. e.g. Claude was trying to provide entire search requests or SPL-like query strings as input, which led to hallucinations) There are also some other changes, which I've commented on in the file changes. **Addendum: I have pushed some more changes, related to the evaluation framework (and running it with Claude). Will comment inline in [`9ebd207` (elastic#181058)](https://github.com/elastic/kibana/pull/181058/commits/9ebd207acd47c33077627356c464958240c9d446).** (cherry picked from commit ba76b50)

#182149) # Backport This will backport the following commits from `main` to `8.14`: - [[Obs AI Assistant] Instructions & Claude improvements (#181058)](#181058)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)

dgieselaar added 2 commits April 17, 2024 16:50

[Obs AI Assistant] Instructions & Claude improvements

b73f6cd

Merge branch 'main' of github.com:elastic/kibana into obs-ai-assistan…

f2475ab

…t-suggest-next-steps

dgieselaar commented Apr 18, 2024

View reviewed changes

dgieselaar added 2 commits April 18, 2024 09:17

Clean up on aisle 5

661520f

Merge branch 'main' of github.com:elastic/kibana into obs-ai-assistan…

4744879

…t-suggest-next-steps

dgieselaar commented Apr 18, 2024

View reviewed changes

dgieselaar added release_note:skip Skip the PR/issue when compiling release notes v8.14.0 labels Apr 18, 2024

dgieselaar marked this pull request as ready for review April 18, 2024 07:35

dgieselaar requested review from a team as code owners April 18, 2024 07:35

cauemarcondes approved these changes Apr 18, 2024

View reviewed changes

botelastic bot added Team:Obs AI Assistant Observability AI Assistant Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team labels Apr 18, 2024

smith added the apm:review label Apr 18, 2024

dgieselaar added 5 commits April 21, 2024 18:09

Fix tests/types in -app

31a2139

Merge branch 'main' of github.com:elastic/kibana into obs-ai-assistan…

0849046

…t-suggest-next-steps

Fix tests

02bba6d

Merge branch 'main' of github.com:elastic/kibana into obs-ai-assistan…

b0713a6

…t-suggest-next-steps

Improvements for Claude + evaluation framework

9ebd207

dgieselaar commented Apr 22, 2024

View reviewed changes

dgieselaar added 2 commits April 23, 2024 08:22

Fix test in rule connector

3eaf7f9

Merge branch 'main' of github.com:elastic/kibana into obs-ai-assistan…

8040437

…t-suggest-next-steps

sorenlouv approved these changes Apr 23, 2024

View reviewed changes

Review feedback

b62414c

dgieselaar enabled auto-merge (squash) April 23, 2024 17:45

Merge branch 'main' into obs-ai-assistant-suggest-next-steps

847a248

Merge branch 'main' into obs-ai-assistant-suggest-next-steps

ecf5f15

dgieselaar merged commit ba76b50 into elastic:main Apr 24, 2024
20 checks passed

kibanamachine added the v8.15.0 label Apr 24, 2024

kibanamachine mentioned this pull request Apr 30, 2024

[Obs AI Assistant] Refactor ObservabilityAIAssistantClient #181255

Merged

dgieselaar deleted the obs-ai-assistant-suggest-next-steps branch April 30, 2024 12:23

dgieselaar mentioned this pull request Apr 30, 2024

[8.14] [Obs AI Assistant] Instructions & Claude improvements (#181058) #182149

Merged

szaffarano mentioned this pull request Jul 12, 2024

[8.14] [Telemetry][Security Solution] Use the proper index to query builtin alerts (#187859) #188233

Closed

	return client.getKnowledgeBaseStatus().then((response) => {
	const { ready: isReady } = client.getKnowledgeBaseStatus();

[Obs AI Assistant] Instructions & Claude improvements #181058

[Obs AI Assistant] Instructions & Claude improvements #181058

Conversation

dgieselaar commented Apr 17, 2024 • edited by kibanamachine Loading

apmmachine commented Apr 17, 2024

🤖 GitHub comments

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cauemarcondes left a comment

Choose a reason for hiding this comment

elasticmachine commented Apr 18, 2024

dgieselaar commented Apr 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgieselaar commented Apr 24, 2024

kibana-ci commented Apr 24, 2024

💚 Build Succeeded

Metrics [docs]

Module Count

Public APIs missing comments

Async chunks

Public APIs missing exports

Page load bundle

API count

History

kibanamachine commented Apr 24, 2024

💔 All backports failed

Manual backport

Questions ?

dgieselaar commented Apr 30, 2024

💚 All backports created successfully

Questions ?

dgieselaar commented Apr 17, 2024 •

edited by kibanamachine

Loading