Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Elastic AI Assistant] Adds Model Evaluation Tooling #167220

Merged
merged 25 commits into from
Sep 29, 2023

Conversation

spong
Copy link
Member

@spong spong commented Sep 26, 2023

Summary

This PR introduces a new internal/elastic_assistant/evaluate route and Evaluation Advanced Setting within the Assistant for benchmarking and testing models, agents, and other aspects of the Assistant configuration.

Enable via the assistantModelEvaluation experimental feature in your kibana.dev.yml (and better add discoverInTimeline for good measure as well! :)

xpack.securitySolution.enableExperimental: ['assistantModelEvaluation', 'discoverInTimeline']

Then access from within the Advanced Settings modal in the Assistant. To use, first select your Connectors/Models, then corresponding Agent configurations, then what model you would like to use for final evaluation, the evaluation type, and if custom, you can specify the evaluation prompt that is sent off to the evaluator model. Finally, specify the dataset, and output index that the results should be written to, then click Perform evaluation.

Sample datasets can be found in x-pack/plugins/elastic_assistant/server/lib/model_evaluator/datasets, and include:

  • esql_dataset.json
  • query_dataset.json
  • security_labs.json
  • security_questions_dataset.json

Checklist

Delete any items that are not applicable to this PR.

@spong spong added release_note:skip Skip the PR/issue when compiling release notes backport:skip This commit does not require backporting Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Feature:Security Assistant Security Assistant v8.11.0 labels Sep 26, 2023
@spong spong self-assigned this Sep 26, 2023
})
.showHelpOnFail(false),
(argv) => {
// performEvaluation({ dataset: DEFAULT_DATASET, logger }).catch((err) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a placeholder per an offline discussion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, this is for utilizing the yarn evaluate-models CLI tooling. Ended up going the UI route first for flexibility/ease of use, but this is where we'll plumb through the CLI/test tooling.

Copy link
Contributor

@andrew-goldstein andrew-goldstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @spong for providing this capability to test at scale 🙏
✅ Desk tested locally
LGTM 🚀

@spong spong marked this pull request as ready for review September 26, 2023 23:23
@spong spong requested a review from a team as a code owner September 26, 2023 23:23
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@spong spong requested a review from a team as a code owner September 28, 2023 22:12
@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #5 / serverless observability UI navigation navigate observability sidenav & breadcrumbs

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
securitySolution 4557 4560 +3

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 12.8MB 12.8MB +53.0KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
securitySolution 62.7KB 62.7KB +28.0B
Unknown metric groups

ESLint disabled line counts

id before after diff
elasticAssistant 10 13 +3

Total ESLint disabled count

id before after diff
elasticAssistant 10 13 +3

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @spong

@spong spong merged commit 3ba0f32 into elastic:main Sep 29, 2023
@spong spong deleted the assistant-esql-eval branch September 29, 2023 15:32
jbudz pushed a commit that referenced this pull request Sep 29, 2023
[Redo this PR](#167220) because
[this PR](#167220) merged shortly
before broke it and I had to fix an import

---------

Co-authored-by: lcawl <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Security Assistant Security Assistant release_note:skip Skip the PR/issue when compiling release notes Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants