-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Risk Score Extensions - Stage 1 #2236
Conversation
|
||
|
||
### Risk Category Fields | ||
Some of the context here was discussed in Stage 0; please read the above for that. More specifically, these fields seek to provide the category contributions to the score, and the number of risk inputs in that category, across each of the five proposed categories: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other note (which I can add to the RFC, but let's discuss here): The main motivation for having separate fields here per category (as opposed to an object
like in riskiest_inputs
) is the ability to search/filter scores on these attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point being: if there's a better way to do this other than type: nested
I'm all ears 😄
More general naming to allow for modifying/extending the definition of these fields in the future.
This will be useful if we have need to search on this field.
With the context/nesting of these fields under the `risk` namespace, most of the `risk_` prefixes are redundant. Also shortens `identifier_*` to `id_*` as the verbosity does not help with the distinction with the `id` field. Luckily, we have ECS descriptions to disambiguate :)
We have YAML here, let's actually leverage its features!
* Adds category-specific description to _count fields * Conslidates the general description of Risk Categories with the category-specific definition
@ebeahan I've updated the RFC with your feedback, thank you! |
## Summary * Introduces a new API, POST `/api/risk_scores/calculate`, that triggers the code introduced here * As with the [preview route](#155966), this endpoint is behind the `riskScoringRoutesEnabled` feature flag * We intend to __REMOVE__ this endpoint before 8.10 release; it's mainly a convenience/checkpoint for testing the existing code. The next PR will introduce a scheduled Task Manager task that invokes this code periodically. * Updates to the /preview route: * `data_view_id` is now a required parameter on both endpoints. If a dataview is not found by that ID, the id is used as the general index pattern to the query. * Response has been updated to be more similar to the [ECS risk fields](elastic/ecs#2236) powering this data. * Mappings created by the [Data Client](#158422) have been updated to be aligned to the ECS risk fields (linked above) * Adds/updates the [OpenAPI spec](https://github.com/elastic/kibana/blob/main/x-pack/plugins/security_solution/server/lib/risk_engine/schema/risk_score_apis.yml) for these endpoints; useful starting point if you're trying to get oriented here. ## Things to review * [PR Demo environment](https://rylnd-pr-161503-risk-score-task-api.kbndev.co/app/home) * Preview API and related UI still works as expected * Calculation/Persistence API correctly bootstraps/persists data * correct mappings/ILM are created * things work in non-default spaces ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios ### Risk Matrix Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Multiple Spaces—unexpected behavior in non-default Kibana Space. | Low | High | Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. | | Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. | High | Low | Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. | | Code should gracefully handle cases when feature X or plugin Y are disabled. | Medium | High | Unit tests will verify that any feature flag or plugin combination still results in our service operational. | | [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
## Summary * Introduces a new API, POST `/api/risk_scores/calculate`, that triggers the code introduced here * As with the [preview route](elastic#155966), this endpoint is behind the `riskScoringRoutesEnabled` feature flag * We intend to __REMOVE__ this endpoint before 8.10 release; it's mainly a convenience/checkpoint for testing the existing code. The next PR will introduce a scheduled Task Manager task that invokes this code periodically. * Updates to the /preview route: * `data_view_id` is now a required parameter on both endpoints. If a dataview is not found by that ID, the id is used as the general index pattern to the query. * Response has been updated to be more similar to the [ECS risk fields](elastic/ecs#2236) powering this data. * Mappings created by the [Data Client](elastic#158422) have been updated to be aligned to the ECS risk fields (linked above) * Adds/updates the [OpenAPI spec](https://github.com/elastic/kibana/blob/main/x-pack/plugins/security_solution/server/lib/risk_engine/schema/risk_score_apis.yml) for these endpoints; useful starting point if you're trying to get oriented here. ## Things to review * [PR Demo environment](https://rylnd-pr-161503-risk-score-task-api.kbndev.co/app/home) * Preview API and related UI still works as expected * Calculation/Persistence API correctly bootstraps/persists data * correct mappings/ILM are created * things work in non-default spaces ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios ### Risk Matrix Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Multiple Spaces—unexpected behavior in non-default Kibana Space. | Low | High | Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. | | Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. | High | Low | Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. | | Code should gracefully handle cases when feature X or plugin Y are disabled. | Medium | High | Unit tests will verify that any feature flag or plugin combination still results in our service operational. | | [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
## Summary * Introduces a new API, POST `/api/risk_scores/calculate`, that triggers the code introduced here * As with the [preview route](elastic#155966), this endpoint is behind the `riskScoringRoutesEnabled` feature flag * We intend to __REMOVE__ this endpoint before 8.10 release; it's mainly a convenience/checkpoint for testing the existing code. The next PR will introduce a scheduled Task Manager task that invokes this code periodically. * Updates to the /preview route: * `data_view_id` is now a required parameter on both endpoints. If a dataview is not found by that ID, the id is used as the general index pattern to the query. * Response has been updated to be more similar to the [ECS risk fields](elastic/ecs#2236) powering this data. * Mappings created by the [Data Client](elastic#158422) have been updated to be aligned to the ECS risk fields (linked above) * Adds/updates the [OpenAPI spec](https://github.com/elastic/kibana/blob/main/x-pack/plugins/security_solution/server/lib/risk_engine/schema/risk_score_apis.yml) for these endpoints; useful starting point if you're trying to get oriented here. ## Things to review * [PR Demo environment](https://rylnd-pr-161503-risk-score-task-api.kbndev.co/app/home) * Preview API and related UI still works as expected * Calculation/Persistence API correctly bootstraps/persists data * correct mappings/ILM are created * things work in non-default spaces ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios ### Risk Matrix Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Multiple Spaces—unexpected behavior in non-default Kibana Space. | Low | High | Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. | | Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. | High | Low | Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. | | Code should gracefully handle cases when feature X or plugin Y are disabled. | Medium | High | Unit tests will verify that any feature flag or plugin combination still results in our service operational. | | [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I overlooked before there isn't a sponsor listed. Would that be @SourinPaul?
I believe that's correct; I'll update the PR and consult him. I'm also going to try to solicit another round of feedback from EA folks; if there's additional changes @ebeahan I'll re-request your review. Thanks! |
Yes, please. Thanks @rylnd. |
@lauravoicu would appreciate it if you could quickly review this PR and see if we missed anything. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great additions! 🚀
### Risk Category Fields | ||
Some of the context here was discussed in Stage 0; please read the above for that. More specifically, these fields seek to provide the category contributions to the score, and the number of risk inputs in that category, across each of the five proposed categories: | ||
|
||
* `category_1_score` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do users know what Category 1, 2...etc. mean? Would it make sense for these field names to reflect what the category means, for example alert_category_score
, posture_category_count
etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that too, but it seems that similar to id_field
and whatnot, ideally the content of each numbered category could be changed without changing the fields? e.g. category_1
could be from alerts one day, and changed to something else in the future 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question! The more specific alert_category_score
etc. were actually the original proposition, but we decided to go the more generic route for future-proofing.
@susan-shu-c got the gist of it, although more accurately we want to allow extending each category's definition as opposed to outright changing it (for backwards compatibility, etc.). For example, category_1
currently includes alerts, but might be extended in the future to allow more alert-like sources.
This all was described in a little more detail in stage 0 if you're interested 👍
## What this PR does * Adds a new Task Manager task, `risk_engine:risk_scoring`, responsible for invoking the `calculateAndPersistRiskScores` API defined in the risk scoring service. * Unlike an alerting task, we do not encrypt/persist an API key for the user. Instead, we use the internal kibana user to query all alerts in the current space. * The task configuration is stored as part of the existing `risk-engine-configuration` Saved Object * Extends the `risk-engine-configuration` SO to include more configuration fields * Management of this configuration is not currently exposed to the user. They can only enable/disable the entire "Risk Engine" on the `Settings -> Entity Risk Score` page * The settings currently serve mainly as the "default" values for task execution, but also as a way for a customer/SA to modify task execution if necessary. * We expect to be modifying these default values before release, as part of our planned "tuning" stage. ### How to Review * Setup: * The risk engine acts on Detection engine alerts, and so you will need to create: 1. some "source" data (logs, filebeat, auditbeat, etc) 2. Rules looking for the above "source" data, and generating alerts * The risk engine requires two feature flags, currently: `riskScoringPersistence` and `riskScoringRoutesEnabled` * You will also need a Platinum or greater license. 1. Test that the task executes correctly 1. With the above data set up, navigate to `Settings -> Entity Risk Score` page, and enable the task by toggling `Entity risk scoring` to `On` 1. Within a few minutes, risk scores should be written to the risk score datastream: * `GET risk-score.risk-score-default/_search` * Replace `default` with the name of your current space, as necessary. 1. Disabling/re-enabling the risk engine should trigger another execution of the task (similar to disabling/enabling a DE rule) 1. Enable the risk engine in another space * The engine (and task) can be enabled/executed in any kibana space. * Because the engine only acts upon alerts in the current space, you will need to first ensure alerts exist in that space. 1. Validate the data/mappings of persisted risk scores * Scores are based on the Stage 1 [ECS RFC](elastic/ecs#2236) * There is no UI reading from these scores, currently (but that is introduced in #163237) ### Risk Matrix Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Multiple Spaces—unexpected behavior in non-default Kibana Space. | Low | High | Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. | | Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. | High | Low | Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. | | Code should gracefully handle cases when feature X or plugin Y are disabled. | Medium | High | Unit tests will verify that any feature flag or plugin combination still results in our service operational. | | [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: kibanamachine <[email protected]>
Followup to #2232 .