-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Obs AI Assistant] Improve OOTB experience with evaluation framework #203122
Labels
bug
Fixes for quality problems that affect the customer experience
Team:Obs AI Assistant
Observability AI Assistant
Comments
sorenlouv
added
the
bug
Fixes for quality problems that affect the customer experience
label
Dec 5, 2024
Pinging @elastic/obs-ai-assistant (Team:Obs AI Assistant) |
problem: When running the evaluation script with the Steps to Reproduce: Run the following command: node x-pack/plugins/observability_solution/observability_ai_assistant_app/scripts/evaluation/index.js --kibana http://localhost:5601 --persist --spaceId some-space-id The script fails to recognize or apply the |
1 task
stratoula
pushed a commit
to stratoula/kibana
that referenced
this issue
Jan 2, 2025
Closes elastic#203122 ## Summary ### Problem The Obs AI Assistant LLM evaluation framework cannot successfully run in the current state in the `main` branch and has missing scenarios. Problems identified: - Unable to run the evaluation with a local Elasticsearch instance - Alerts and APM results are skipped entirely when reporting the final result on the terminal (due to consistent failures in the tests) - State contaminations between runs makes the script throw errors when run multiple times. - Authentication issues when calling `/internal` APIs ### Solution As a part of spacetime, worked on fixing the current issues in the LLM evaluation framework and working on improving and enhancing the framework. #### Fixes | Problem | RC (Root Cause) | Fixed? | |------------------------|---------------------------------|--------| | Running with a local Elasticsearch instance | Service URLs were not picking up the correct auth because of the format specified in `kibana.dev.yml` | ✅ | | Alerts and APM results skipped in final result | Most (if not all) tests are failing in the alerts and APM suites, hence no final results are reported. | ✅ (all test scenarios fixed) | | State contaminations between runs | Some `after` hooks were not running successfully because of an error in the `callKibana` method | ✅ | | Authentication issues when calling `/internal` APIs | The required headers are not present in the request | ✅ | #### Enhancements / Improvements | What was added | How does it enhance the framework | |------------------------|---------------------------------| | Added new KB retrieval test to the KB scenario | More scenarios covered | | Added new scenario for the `retrieve_elastic_doc` function | Cover missing newly added functions | | Enhance how scope is used for each scenario and apply correct scope | The scope determines the wording of the system message. Certain scenarios need to be scoped to observability (e.g.: `alerts`) to produce the best result. At present all scenarios use the scope `all` which is not ideal and doesn't align with the actual functionality of the AI Assistant | | Avoid throwing unnecessary errors on the console (This was fixed by adding guard rails, e.g.: not creating a dataview if it exists) | Makes it easier to navigate through the results printed on the terminal | | Improved readme | Easier to configure and use the framework while identifying all possible options | | Improved logging | Easier to navigate through the terminal output | ### Checklist - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: kibanamachine <[email protected]>
benakansara
pushed a commit
to benakansara/kibana
that referenced
this issue
Jan 2, 2025
Closes elastic#203122 ## Summary ### Problem The Obs AI Assistant LLM evaluation framework cannot successfully run in the current state in the `main` branch and has missing scenarios. Problems identified: - Unable to run the evaluation with a local Elasticsearch instance - Alerts and APM results are skipped entirely when reporting the final result on the terminal (due to consistent failures in the tests) - State contaminations between runs makes the script throw errors when run multiple times. - Authentication issues when calling `/internal` APIs ### Solution As a part of spacetime, worked on fixing the current issues in the LLM evaluation framework and working on improving and enhancing the framework. #### Fixes | Problem | RC (Root Cause) | Fixed? | |------------------------|---------------------------------|--------| | Running with a local Elasticsearch instance | Service URLs were not picking up the correct auth because of the format specified in `kibana.dev.yml` | ✅ | | Alerts and APM results skipped in final result | Most (if not all) tests are failing in the alerts and APM suites, hence no final results are reported. | ✅ (all test scenarios fixed) | | State contaminations between runs | Some `after` hooks were not running successfully because of an error in the `callKibana` method | ✅ | | Authentication issues when calling `/internal` APIs | The required headers are not present in the request | ✅ | #### Enhancements / Improvements | What was added | How does it enhance the framework | |------------------------|---------------------------------| | Added new KB retrieval test to the KB scenario | More scenarios covered | | Added new scenario for the `retrieve_elastic_doc` function | Cover missing newly added functions | | Enhance how scope is used for each scenario and apply correct scope | The scope determines the wording of the system message. Certain scenarios need to be scoped to observability (e.g.: `alerts`) to produce the best result. At present all scenarios use the scope `all` which is not ideal and doesn't align with the actual functionality of the AI Assistant | | Avoid throwing unnecessary errors on the console (This was fixed by adding guard rails, e.g.: not creating a dataview if it exists) | Makes it easier to navigate through the results printed on the terminal | | Improved readme | Easier to configure and use the framework while identifying all possible options | | Improved logging | Easier to navigate through the terminal output | ### Checklist - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: kibanamachine <[email protected]>
cqliu1
pushed a commit
to cqliu1/kibana
that referenced
this issue
Jan 2, 2025
Closes elastic#203122 ## Summary ### Problem The Obs AI Assistant LLM evaluation framework cannot successfully run in the current state in the `main` branch and has missing scenarios. Problems identified: - Unable to run the evaluation with a local Elasticsearch instance - Alerts and APM results are skipped entirely when reporting the final result on the terminal (due to consistent failures in the tests) - State contaminations between runs makes the script throw errors when run multiple times. - Authentication issues when calling `/internal` APIs ### Solution As a part of spacetime, worked on fixing the current issues in the LLM evaluation framework and working on improving and enhancing the framework. #### Fixes | Problem | RC (Root Cause) | Fixed? | |------------------------|---------------------------------|--------| | Running with a local Elasticsearch instance | Service URLs were not picking up the correct auth because of the format specified in `kibana.dev.yml` | ✅ | | Alerts and APM results skipped in final result | Most (if not all) tests are failing in the alerts and APM suites, hence no final results are reported. | ✅ (all test scenarios fixed) | | State contaminations between runs | Some `after` hooks were not running successfully because of an error in the `callKibana` method | ✅ | | Authentication issues when calling `/internal` APIs | The required headers are not present in the request | ✅ | #### Enhancements / Improvements | What was added | How does it enhance the framework | |------------------------|---------------------------------| | Added new KB retrieval test to the KB scenario | More scenarios covered | | Added new scenario for the `retrieve_elastic_doc` function | Cover missing newly added functions | | Enhance how scope is used for each scenario and apply correct scope | The scope determines the wording of the system message. Certain scenarios need to be scoped to observability (e.g.: `alerts`) to produce the best result. At present all scenarios use the scope `all` which is not ideal and doesn't align with the actual functionality of the AI Assistant | | Avoid throwing unnecessary errors on the console (This was fixed by adding guard rails, e.g.: not creating a dataview if it exists) | Makes it easier to navigate through the results printed on the terminal | | Improved readme | Easier to configure and use the framework while identifying all possible options | | Improved logging | Easier to navigate through the terminal output | ### Checklist - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: kibanamachine <[email protected]>
CAWilson94
pushed a commit
to CAWilson94/kibana
that referenced
this issue
Jan 13, 2025
Closes elastic#203122 ## Summary ### Problem The Obs AI Assistant LLM evaluation framework cannot successfully run in the current state in the `main` branch and has missing scenarios. Problems identified: - Unable to run the evaluation with a local Elasticsearch instance - Alerts and APM results are skipped entirely when reporting the final result on the terminal (due to consistent failures in the tests) - State contaminations between runs makes the script throw errors when run multiple times. - Authentication issues when calling `/internal` APIs ### Solution As a part of spacetime, worked on fixing the current issues in the LLM evaluation framework and working on improving and enhancing the framework. #### Fixes | Problem | RC (Root Cause) | Fixed? | |------------------------|---------------------------------|--------| | Running with a local Elasticsearch instance | Service URLs were not picking up the correct auth because of the format specified in `kibana.dev.yml` | ✅ | | Alerts and APM results skipped in final result | Most (if not all) tests are failing in the alerts and APM suites, hence no final results are reported. | ✅ (all test scenarios fixed) | | State contaminations between runs | Some `after` hooks were not running successfully because of an error in the `callKibana` method | ✅ | | Authentication issues when calling `/internal` APIs | The required headers are not present in the request | ✅ | #### Enhancements / Improvements | What was added | How does it enhance the framework | |------------------------|---------------------------------| | Added new KB retrieval test to the KB scenario | More scenarios covered | | Added new scenario for the `retrieve_elastic_doc` function | Cover missing newly added functions | | Enhance how scope is used for each scenario and apply correct scope | The scope determines the wording of the system message. Certain scenarios need to be scoped to observability (e.g.: `alerts`) to produce the best result. At present all scenarios use the scope `all` which is not ideal and doesn't align with the actual functionality of the AI Assistant | | Avoid throwing unnecessary errors on the console (This was fixed by adding guard rails, e.g.: not creating a dataview if it exists) | Makes it easier to navigate through the results printed on the terminal | | Improved readme | Easier to configure and use the framework while identifying all possible options | | Improved logging | Easier to navigate through the terminal output | ### Checklist - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: kibanamachine <[email protected]>
viduni94
added a commit
to viduni94/kibana
that referenced
this issue
Jan 23, 2025
Closes elastic#203122 ## Summary ### Problem The Obs AI Assistant LLM evaluation framework cannot successfully run in the current state in the `main` branch and has missing scenarios. Problems identified: - Unable to run the evaluation with a local Elasticsearch instance - Alerts and APM results are skipped entirely when reporting the final result on the terminal (due to consistent failures in the tests) - State contaminations between runs makes the script throw errors when run multiple times. - Authentication issues when calling `/internal` APIs ### Solution As a part of spacetime, worked on fixing the current issues in the LLM evaluation framework and working on improving and enhancing the framework. #### Fixes | Problem | RC (Root Cause) | Fixed? | |------------------------|---------------------------------|--------| | Running with a local Elasticsearch instance | Service URLs were not picking up the correct auth because of the format specified in `kibana.dev.yml` | ✅ | | Alerts and APM results skipped in final result | Most (if not all) tests are failing in the alerts and APM suites, hence no final results are reported. | ✅ (all test scenarios fixed) | | State contaminations between runs | Some `after` hooks were not running successfully because of an error in the `callKibana` method | ✅ | | Authentication issues when calling `/internal` APIs | The required headers are not present in the request | ✅ | #### Enhancements / Improvements | What was added | How does it enhance the framework | |------------------------|---------------------------------| | Added new KB retrieval test to the KB scenario | More scenarios covered | | Added new scenario for the `retrieve_elastic_doc` function | Cover missing newly added functions | | Enhance how scope is used for each scenario and apply correct scope | The scope determines the wording of the system message. Certain scenarios need to be scoped to observability (e.g.: `alerts`) to produce the best result. At present all scenarios use the scope `all` which is not ideal and doesn't align with the actual functionality of the AI Assistant | | Avoid throwing unnecessary errors on the console (This was fixed by adding guard rails, e.g.: not creating a dataview if it exists) | Makes it easier to navigate through the results printed on the terminal | | Improved readme | Easier to configure and use the framework while identifying all possible options | | Improved logging | Easier to navigate through the terminal output | ### Checklist - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: kibanamachine <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Fixes for quality problems that affect the customer experience
Team:Obs AI Assistant
Observability AI Assistant
The evaluation framework is our only way to assess how well models are performing certain tasks like esql generation and various types of function calling. It is also the easiest way for LLM providers to see how well their LLM is working with the Observability AI Assistant.
It should therefore be easy to run and run with minimal unrelated noise in the output.
Problems:
Authentication issues with local Elasticsearch
It is not possible to run the script against a local Elasticsearch instance. It throws errors like
Sample kibana.dev.yml:
State contamination between runs
The script throws errors when running the script multiple times. It seems there is some data that is created on the first run that causes problems on subsequent runs.
The text was updated successfully, but these errors were encountered: