Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Obs AI Assistant] Improve OOTB experience with evaluation framework #203122

Closed
sorenlouv opened this issue Dec 5, 2024 · 3 comments · Fixed by #204574
Closed

[Obs AI Assistant] Improve OOTB experience with evaluation framework #203122

sorenlouv opened this issue Dec 5, 2024 · 3 comments · Fixed by #204574
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Team:Obs AI Assistant Observability AI Assistant

Comments

@sorenlouv
Copy link
Member

sorenlouv commented Dec 5, 2024

The evaluation framework is our only way to assess how well models are performing certain tasks like esql generation and various types of function calling. It is also the easiest way for LLM providers to see how well their LLM is working with the Observability AI Assistant.

It should therefore be easy to run and run with minimal unrelated noise in the output.

Problems:

Authentication issues with local Elasticsearch

It is not possible to run the script against a local Elasticsearch instance. It throws errors like

action [indices:admin/auto_create] is unauthorized for user [kibana_system] with effective roles [kibana_system] on indices [metrics-apm.internal-default], this action is granted by the index privileges [auto_configure,create_index,manage,all]

Sample kibana.dev.yml:

elasticsearch.hosts: http://localhost:9200
elasticsearch.username: kibana_system
elasticsearch.password: changeme
elasticsearch.ssl.verificationMode: none
elasticsearch.ignoreVersionMismatch: true

State contamination between runs
The script throws errors when running the script multiple times. It seems there is some data that is created on the first run that causes problems on subsequent runs.

@sorenlouv sorenlouv added the bug Fixes for quality problems that affect the customer experience label Dec 5, 2024
@botelastic botelastic bot added the needs-team Issues missing a team label label Dec 5, 2024
@sorenlouv sorenlouv added the Team:Obs AI Assistant Observability AI Assistant label Dec 6, 2024
@botelastic botelastic bot removed the needs-team Issues missing a team label label Dec 6, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ai-assistant (Team:Obs AI Assistant)

@arturoliduena
Copy link
Contributor

Problem:

When running the evaluation framework, the AI Assistant (with Gemini) encounters quota exhaustion errors

Image

@arturoliduena
Copy link
Contributor

problem:

When running the evaluation script with the --spaceId flag, the provided space ID is ignored, and the script does not correctly scope the operation to the specified Kibana space.

Steps to Reproduce:

Run the following command:

node x-pack/plugins/observability_solution/observability_ai_assistant_app/scripts/evaluation/index.js --kibana http://localhost:5601 --persist --spaceId some-space-id 

The script fails to recognize or apply the --spaceId parameter.

@viduni94 viduni94 self-assigned this Dec 9, 2024
stratoula pushed a commit to stratoula/kibana that referenced this issue Jan 2, 2025
Closes elastic#203122

## Summary

### Problem
The Obs AI Assistant LLM evaluation framework cannot successfully run in
the current state in the `main` branch and has missing scenarios.

Problems identified:
- Unable to run the evaluation with a local Elasticsearch instance
- Alerts and APM results are skipped entirely when reporting the final
result on the terminal (due to consistent failures in the tests)
- State contaminations between runs makes the script throw errors when
run multiple times.
- Authentication issues when calling `/internal` APIs

### Solution
As a part of spacetime, worked on fixing the current issues in the LLM
evaluation framework and working on improving and enhancing the
framework.

#### Fixes
| Problem                | RC (Root Cause)                | Fixed? |
|------------------------|---------------------------------|--------|
| Running with a local Elasticsearch instance | Service URLs were not
picking up the correct auth because of the format specified in
`kibana.dev.yml` | ✅ |
| Alerts and APM results skipped in final result | Most (if not all)
tests are failing in the alerts and APM suites, hence no final results
are reported. | ✅ (all test scenarios fixed) |
| State contaminations between runs | Some `after` hooks were not
running successfully because of an error in the `callKibana` method | ✅
|
| Authentication issues when calling `/internal` APIs | The required
headers are not present in the request | ✅ |

#### Enhancements / Improvements

| What was added  | How does it enhance the framework  | 
|------------------------|---------------------------------|
| Added new KB retrieval test to the KB scenario | More scenarios
covered |
| Added new scenario for the `retrieve_elastic_doc` function | Cover
missing newly added functions |
| Enhance how scope is used for each scenario and apply correct scope |
The scope determines the wording of the system message. Certain
scenarios need to be scoped to observability (e.g.: `alerts`) to produce
the best result. At present all scenarios use the scope `all` which is
not ideal and doesn't align with the actual functionality of the AI
Assistant |
| Avoid throwing unnecessary errors on the console (This was fixed by
adding guard rails, e.g.: not creating a dataview if it exists) | Makes
it easier to navigate through the results printed on the terminal |
| Improved readme | Easier to configure and use the framework while
identifying all possible options |
| Improved logging | Easier to navigate through the terminal output |

### Checklist

- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
benakansara pushed a commit to benakansara/kibana that referenced this issue Jan 2, 2025
Closes elastic#203122

## Summary

### Problem
The Obs AI Assistant LLM evaluation framework cannot successfully run in
the current state in the `main` branch and has missing scenarios.

Problems identified:
- Unable to run the evaluation with a local Elasticsearch instance
- Alerts and APM results are skipped entirely when reporting the final
result on the terminal (due to consistent failures in the tests)
- State contaminations between runs makes the script throw errors when
run multiple times.
- Authentication issues when calling `/internal` APIs

### Solution
As a part of spacetime, worked on fixing the current issues in the LLM
evaluation framework and working on improving and enhancing the
framework.

#### Fixes
| Problem                | RC (Root Cause)                | Fixed? |
|------------------------|---------------------------------|--------|
| Running with a local Elasticsearch instance | Service URLs were not
picking up the correct auth because of the format specified in
`kibana.dev.yml` | ✅ |
| Alerts and APM results skipped in final result | Most (if not all)
tests are failing in the alerts and APM suites, hence no final results
are reported. | ✅ (all test scenarios fixed) |
| State contaminations between runs | Some `after` hooks were not
running successfully because of an error in the `callKibana` method | ✅
|
| Authentication issues when calling `/internal` APIs | The required
headers are not present in the request | ✅ |

#### Enhancements / Improvements

| What was added  | How does it enhance the framework  | 
|------------------------|---------------------------------|
| Added new KB retrieval test to the KB scenario | More scenarios
covered |
| Added new scenario for the `retrieve_elastic_doc` function | Cover
missing newly added functions |
| Enhance how scope is used for each scenario and apply correct scope |
The scope determines the wording of the system message. Certain
scenarios need to be scoped to observability (e.g.: `alerts`) to produce
the best result. At present all scenarios use the scope `all` which is
not ideal and doesn't align with the actual functionality of the AI
Assistant |
| Avoid throwing unnecessary errors on the console (This was fixed by
adding guard rails, e.g.: not creating a dataview if it exists) | Makes
it easier to navigate through the results printed on the terminal |
| Improved readme | Easier to configure and use the framework while
identifying all possible options |
| Improved logging | Easier to navigate through the terminal output |

### Checklist

- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
cqliu1 pushed a commit to cqliu1/kibana that referenced this issue Jan 2, 2025
Closes elastic#203122

## Summary

### Problem
The Obs AI Assistant LLM evaluation framework cannot successfully run in
the current state in the `main` branch and has missing scenarios.

Problems identified:
- Unable to run the evaluation with a local Elasticsearch instance
- Alerts and APM results are skipped entirely when reporting the final
result on the terminal (due to consistent failures in the tests)
- State contaminations between runs makes the script throw errors when
run multiple times.
- Authentication issues when calling `/internal` APIs

### Solution
As a part of spacetime, worked on fixing the current issues in the LLM
evaluation framework and working on improving and enhancing the
framework.

#### Fixes
| Problem                | RC (Root Cause)                | Fixed? |
|------------------------|---------------------------------|--------|
| Running with a local Elasticsearch instance | Service URLs were not
picking up the correct auth because of the format specified in
`kibana.dev.yml` | ✅ |
| Alerts and APM results skipped in final result | Most (if not all)
tests are failing in the alerts and APM suites, hence no final results
are reported. | ✅ (all test scenarios fixed) |
| State contaminations between runs | Some `after` hooks were not
running successfully because of an error in the `callKibana` method | ✅
|
| Authentication issues when calling `/internal` APIs | The required
headers are not present in the request | ✅ |

#### Enhancements / Improvements

| What was added  | How does it enhance the framework  | 
|------------------------|---------------------------------|
| Added new KB retrieval test to the KB scenario | More scenarios
covered |
| Added new scenario for the `retrieve_elastic_doc` function | Cover
missing newly added functions |
| Enhance how scope is used for each scenario and apply correct scope |
The scope determines the wording of the system message. Certain
scenarios need to be scoped to observability (e.g.: `alerts`) to produce
the best result. At present all scenarios use the scope `all` which is
not ideal and doesn't align with the actual functionality of the AI
Assistant |
| Avoid throwing unnecessary errors on the console (This was fixed by
adding guard rails, e.g.: not creating a dataview if it exists) | Makes
it easier to navigate through the results printed on the terminal |
| Improved readme | Easier to configure and use the framework while
identifying all possible options |
| Improved logging | Easier to navigate through the terminal output |

### Checklist

- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this issue Jan 13, 2025
Closes elastic#203122

## Summary

### Problem
The Obs AI Assistant LLM evaluation framework cannot successfully run in
the current state in the `main` branch and has missing scenarios.

Problems identified:
- Unable to run the evaluation with a local Elasticsearch instance
- Alerts and APM results are skipped entirely when reporting the final
result on the terminal (due to consistent failures in the tests)
- State contaminations between runs makes the script throw errors when
run multiple times.
- Authentication issues when calling `/internal` APIs

### Solution
As a part of spacetime, worked on fixing the current issues in the LLM
evaluation framework and working on improving and enhancing the
framework.

#### Fixes
| Problem                | RC (Root Cause)                | Fixed? |
|------------------------|---------------------------------|--------|
| Running with a local Elasticsearch instance | Service URLs were not
picking up the correct auth because of the format specified in
`kibana.dev.yml` | ✅ |
| Alerts and APM results skipped in final result | Most (if not all)
tests are failing in the alerts and APM suites, hence no final results
are reported. | ✅ (all test scenarios fixed) |
| State contaminations between runs | Some `after` hooks were not
running successfully because of an error in the `callKibana` method | ✅
|
| Authentication issues when calling `/internal` APIs | The required
headers are not present in the request | ✅ |

#### Enhancements / Improvements

| What was added  | How does it enhance the framework  | 
|------------------------|---------------------------------|
| Added new KB retrieval test to the KB scenario | More scenarios
covered |
| Added new scenario for the `retrieve_elastic_doc` function | Cover
missing newly added functions |
| Enhance how scope is used for each scenario and apply correct scope |
The scope determines the wording of the system message. Certain
scenarios need to be scoped to observability (e.g.: `alerts`) to produce
the best result. At present all scenarios use the scope `all` which is
not ideal and doesn't align with the actual functionality of the AI
Assistant |
| Avoid throwing unnecessary errors on the console (This was fixed by
adding guard rails, e.g.: not creating a dataview if it exists) | Makes
it easier to navigate through the results printed on the terminal |
| Improved readme | Easier to configure and use the framework while
identifying all possible options |
| Improved logging | Easier to navigate through the terminal output |

### Checklist

- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
viduni94 added a commit to viduni94/kibana that referenced this issue Jan 23, 2025
Closes elastic#203122

## Summary

### Problem
The Obs AI Assistant LLM evaluation framework cannot successfully run in
the current state in the `main` branch and has missing scenarios.

Problems identified:
- Unable to run the evaluation with a local Elasticsearch instance
- Alerts and APM results are skipped entirely when reporting the final
result on the terminal (due to consistent failures in the tests)
- State contaminations between runs makes the script throw errors when
run multiple times.
- Authentication issues when calling `/internal` APIs

### Solution
As a part of spacetime, worked on fixing the current issues in the LLM
evaluation framework and working on improving and enhancing the
framework.

#### Fixes
| Problem                | RC (Root Cause)                | Fixed? |
|------------------------|---------------------------------|--------|
| Running with a local Elasticsearch instance | Service URLs were not
picking up the correct auth because of the format specified in
`kibana.dev.yml` | ✅ |
| Alerts and APM results skipped in final result | Most (if not all)
tests are failing in the alerts and APM suites, hence no final results
are reported. | ✅ (all test scenarios fixed) |
| State contaminations between runs | Some `after` hooks were not
running successfully because of an error in the `callKibana` method | ✅
|
| Authentication issues when calling `/internal` APIs | The required
headers are not present in the request | ✅ |

#### Enhancements / Improvements

| What was added  | How does it enhance the framework  | 
|------------------------|---------------------------------|
| Added new KB retrieval test to the KB scenario | More scenarios
covered |
| Added new scenario for the `retrieve_elastic_doc` function | Cover
missing newly added functions |
| Enhance how scope is used for each scenario and apply correct scope |
The scope determines the wording of the system message. Certain
scenarios need to be scoped to observability (e.g.: `alerts`) to produce
the best result. At present all scenarios use the scope `all` which is
not ideal and doesn't align with the actual functionality of the AI
Assistant |
| Avoid throwing unnecessary errors on the console (This was fixed by
adding guard rails, e.g.: not creating a dataview if it exists) | Makes
it easier to navigate through the results printed on the terminal |
| Improved readme | Easier to configure and use the framework while
identifying all possible options |
| Improved logging | Easier to navigate through the terminal output |

### Checklist

- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Team:Obs AI Assistant Observability AI Assistant
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants