-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution][Elastic AI Assistant] Adds Model Evaluation Tooling #167220
Conversation
...es/kbn-elastic-assistant/impl/assistant/settings/evaluation_settings/evaluation_settings.tsx
Outdated
Show resolved
Hide resolved
...kbn-elastic-assistant/impl/assistant/settings/evaluation_settings/use_perform_evaluation.tsx
Outdated
Show resolved
Hide resolved
...kbn-elastic-assistant/impl/assistant/settings/evaluation_settings/use_perform_evaluation.tsx
Outdated
Show resolved
Hide resolved
}) | ||
.showHelpOnFail(false), | ||
(argv) => { | ||
// performEvaluation({ dataset: DEFAULT_DATASET, logger }).catch((err) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a placeholder per an offline discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, this is for utilizing the yarn evaluate-models
CLI tooling. Ended up going the UI route first for flexibility/ease of use, but this is where we'll plumb through the CLI/test tooling.
x-pack/plugins/elastic_assistant/server/routes/evaluate/post_evaluate.ts
Show resolved
Hide resolved
x-pack/plugins/elastic_assistant/server/lib/model_evaluator/datasets/esql_dataset.json
Outdated
Show resolved
Hide resolved
x-pack/plugins/elastic_assistant/server/lib/model_evaluator/evaluation.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @spong for providing this capability to test at scale 🙏
✅ Desk tested locally
LGTM 🚀
Pinging @elastic/security-solution (Team: SecuritySolution) |
…scover link, and fixes unhandled promise exception
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Module Count
Async chunks
Page load bundle
Unknown metric groupsESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @spong |
[Redo this PR](#167220) because [this PR](#167220) merged shortly before broke it and I had to fix an import --------- Co-authored-by: lcawl <[email protected]>
Summary
This PR introduces a new
internal/elastic_assistant/evaluate
route andEvaluation
Advanced Setting within the Assistant for benchmarking and testing models, agents, and other aspects of the Assistant configuration.Enable via the
assistantModelEvaluation
experimental feature in yourkibana.dev.yml
(and better adddiscoverInTimeline
for good measure as well! :)Then access from within the
Advanced Settings
modal in the Assistant. To use, first select your Connectors/Models, then corresponding Agent configurations, then what model you would like to use for final evaluation, the evaluation type, and ifcustom
, you can specify the evaluation prompt that is sent off to the evaluator model. Finally, specify thedataset
, andoutput index
that the results should be written to, then clickPerform evaluation
.Sample datasets can be found in
x-pack/plugins/elastic_assistant/server/lib/model_evaluator/datasets
, and include:esql_dataset.json
query_dataset.json
security_labs.json
security_questions_dataset.json
Checklist
Delete any items that are not applicable to this PR.