forked from elastic/kibana
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Security Assistant] Adds support for LangGraph evaluations (elastic#…
…190574) ## Summary This PR updates the existing evaluation framework to support LangGraph. Since the evaluation code was the last reference to the old agent executors, we were able to finally remove those as well. The evaluation feature remains behind a feature flag, and can be enabled with the following configuration: ``` xpack.securitySolution.enableExperimental: - 'assistantModelEvaluation' ``` Once enabled, the `Evaluation` tab will become visible in settings: <p align="center"> <img width="800" src="https://github.com/user-attachments/assets/8a0b8691-73a3-43b7-996b-8cc408edd5ab" /> </p> Notes: * We no longer write evaluation results to a local ES index. We can still do this, but most the value comes from viewing the results in LangSmith, so I didn't re-plumb this functionality after switching over to the new LangSmith `evaluator` function. * Need to add back support for custom datasets if we find this useful. Currently only LangSmith datasets are supported. Ended up porting over the `GET datasets` API from elastic#181348 to make this more useful. the `GET evaluate` route now returns `datasets`, an array of dataset names from LangSmith. * Some additional fields still need to be ported over to the POST evaluation API, like `size` and `alertsIndexPattern`. Update: Ported to API, just need presence in UI. * `Project name` was removed from the eval UI as we no longer need to tag runs to a specific project with the new LangSmith `evaluator` since they automatically show up under the `Experiments` section. * The 'Evaluation (Optional)' section currently isn't used, so it's been removed. We can re-enable this when there is need to run local evals on predictions outside of LangSmith. To test, set a `Run name`, input a Dataset from LangSmith e.g. `LangGraph Eval Testing`, select a few connectors and the `DefaultAssistantGraph`, then click `Perform evaluation...`. Results will show up in LangSmith under `Datasets & Testing`. Note: It's easy to run into rate limiting errors with Gemini, so just keep aware of that when running larger datasets. The new LangSmith `evaluator` function has an option for `maxConcurrency` to control the maximum number of concurrent evaluations to run, so we can tweak that as needed.. Once complete, you can compare all results side-by-side in LangSmith :tada: <img width="2312" alt="image" src="https://github.com/user-attachments/assets/7ca31722-7400-4717-9735-d6c1c97b6e49"> --------- Co-authored-by: kibanamachine <[email protected]>
- Loading branch information
1 parent
77ad05e
commit c276638
Showing
35 changed files
with
518 additions
and
2,430 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
17 changes: 11 additions & 6 deletions
17
...kages/kbn-elastic-assistant-common/impl/schemas/evaluation/get_evaluate_route.schema.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.