[RFC] Risk Score Extensions - Stage 1 #2236

rylnd · 2023-07-18T23:07:20Z

Followup to #2232 .

rylnd · 2023-07-19T16:43:10Z

rfcs/text/0042-risk-score-extensions.md

+
+
+### Risk Category Fields
+Some of the context here was discussed in Stage 0; please read the above for that. More specifically, these fields seek to provide the category contributions to the score, and the number of risk inputs in that category, across each of the five proposed categories:


One other note (which I can add to the RFC, but let's discuss here): The main motivation for having separate fields here per category (as opposed to an object like in riskiest_inputs) is the ability to search/filter scores on these attributes.

Point being: if there's a better way to do this other than type: nested I'm all ears 😄

rfcs/text/0042-risk-score-extensions.md

rfcs/text/0042/risk.yml

More general naming to allow for modifying/extending the definition of these fields in the future.

rfcs/text/0042-risk-score-extensions.md

rfcs/text/0042/risk.yml

This will be useful if we have need to search on this field.

With the context/nesting of these fields under the `risk` namespace, most of the `risk_` prefixes are redundant. Also shortens `identifier_*` to `id_*` as the verbosity does not help with the distinction with the `id` field. Luckily, we have ECS descriptions to disambiguate :)

We have YAML here, let's actually leverage its features!

* Adds category-specific description to _count fields * Conslidates the general description of Risk Categories with the category-specific definition

rylnd · 2023-07-27T20:20:40Z

@ebeahan I've updated the RFC with your feedback, thank you!

## Summary * Introduces a new API, POST `/api/risk_scores/calculate`, that triggers the code introduced here * As with the [preview route](#155966), this endpoint is behind the `riskScoringRoutesEnabled` feature flag * We intend to __REMOVE__ this endpoint before 8.10 release; it's mainly a convenience/checkpoint for testing the existing code. The next PR will introduce a scheduled Task Manager task that invokes this code periodically. * Updates to the /preview route: * `data_view_id` is now a required parameter on both endpoints. If a dataview is not found by that ID, the id is used as the general index pattern to the query. * Response has been updated to be more similar to the [ECS risk fields](elastic/ecs#2236) powering this data. * Mappings created by the [Data Client](#158422) have been updated to be aligned to the ECS risk fields (linked above) * Adds/updates the [OpenAPI spec](https://github.com/elastic/kibana/blob/main/x-pack/plugins/security_solution/server/lib/risk_engine/schema/risk_score_apis.yml) for these endpoints; useful starting point if you're trying to get oriented here. ## Things to review * [PR Demo environment](https://rylnd-pr-161503-risk-score-task-api.kbndev.co/app/home) * Preview API and related UI still works as expected * Calculation/Persistence API correctly bootstraps/persists data * correct mappings/ILM are created * things work in non-default spaces ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios ### Risk Matrix Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Multiple Spaces—unexpected behavior in non-default Kibana Space. | Low | High | Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. | | Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. | High | Low | Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. | | Code should gracefully handle cases when feature X or plugin Y are disabled. | Medium | High | Unit tests will verify that any feature flag or plugin combination still results in our service operational. | | [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

## Summary * Introduces a new API, POST `/api/risk_scores/calculate`, that triggers the code introduced here * As with the [preview route](elastic#155966), this endpoint is behind the `riskScoringRoutesEnabled` feature flag * We intend to __REMOVE__ this endpoint before 8.10 release; it's mainly a convenience/checkpoint for testing the existing code. The next PR will introduce a scheduled Task Manager task that invokes this code periodically. * Updates to the /preview route: * `data_view_id` is now a required parameter on both endpoints. If a dataview is not found by that ID, the id is used as the general index pattern to the query. * Response has been updated to be more similar to the [ECS risk fields](elastic/ecs#2236) powering this data. * Mappings created by the [Data Client](elastic#158422) have been updated to be aligned to the ECS risk fields (linked above) * Adds/updates the [OpenAPI spec](https://github.com/elastic/kibana/blob/main/x-pack/plugins/security_solution/server/lib/risk_engine/schema/risk_score_apis.yml) for these endpoints; useful starting point if you're trying to get oriented here. ## Things to review * [PR Demo environment](https://rylnd-pr-161503-risk-score-task-api.kbndev.co/app/home) * Preview API and related UI still works as expected * Calculation/Persistence API correctly bootstraps/persists data * correct mappings/ILM are created * things work in non-default spaces ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios ### Risk Matrix Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Multiple Spaces—unexpected behavior in non-default Kibana Space. | Low | High | Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. | | Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. | High | Low | Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. | | Code should gracefully handle cases when feature X or plugin Y are disabled. | Medium | High | Unit tests will verify that any feature flag or plugin combination still results in our service operational. | | [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

ebeahan

LGTM, but I overlooked before there isn't a sponsor listed. Would that be @SourinPaul?

rylnd · 2023-08-04T18:36:15Z

LGTM, but I overlooked before there isn't a sponsor listed. Would that be @SourinPaul?

I believe that's correct; I'll update the PR and consult him. I'm also going to try to solicit another round of feedback from EA folks; if there's additional changes @ebeahan I'll re-request your review. Thanks!

SourinPaul · 2023-08-06T18:05:22Z

I believe that's correct;

Yes, please. Thanks @rylnd.

SourinPaul · 2023-08-08T17:39:30Z

@lauravoicu would appreciate it if you could quickly review this PR and see if we missed anything. Thanks!

ajosh0504

Great additions! 🚀

ajosh0504 · 2023-08-08T18:50:14Z

rfcs/text/0042-risk-score-extensions.md

+### Risk Category Fields
+Some of the context here was discussed in Stage 0; please read the above for that. More specifically, these fields seek to provide the category contributions to the score, and the number of risk inputs in that category, across each of the five proposed categories:
+
+* `category_1_score`


How do users know what Category 1, 2...etc. mean? Would it make sense for these field names to reflect what the category means, for example alert_category_score, posture_category_count etc?

I was thinking that too, but it seems that similar to id_field and whatnot, ideally the content of each numbered category could be changed without changing the fields? e.g. category_1 could be from alerts one day, and changed to something else in the future 🤔

Great question! The more specific alert_category_score etc. were actually the original proposition, but we decided to go the more generic route for future-proofing.

@susan-shu-c got the gist of it, although more accurately we want to allow extending each category's definition as opposed to outright changing it (for backwards compatibility, etc.). For example, category_1 currently includes alerts, but might be extended in the future to allow more alert-like sources.

This all was described in a little more detail in stage 0 if you're interested 👍

## What this PR does * Adds a new Task Manager task, `risk_engine:risk_scoring`, responsible for invoking the `calculateAndPersistRiskScores` API defined in the risk scoring service. * Unlike an alerting task, we do not encrypt/persist an API key for the user. Instead, we use the internal kibana user to query all alerts in the current space. * The task configuration is stored as part of the existing `risk-engine-configuration` Saved Object * Extends the `risk-engine-configuration` SO to include more configuration fields * Management of this configuration is not currently exposed to the user. They can only enable/disable the entire "Risk Engine" on the `Settings -> Entity Risk Score` page * The settings currently serve mainly as the "default" values for task execution, but also as a way for a customer/SA to modify task execution if necessary. * We expect to be modifying these default values before release, as part of our planned "tuning" stage. ### How to Review * Setup: * The risk engine acts on Detection engine alerts, and so you will need to create: 1. some "source" data (logs, filebeat, auditbeat, etc) 2. Rules looking for the above "source" data, and generating alerts * The risk engine requires two feature flags, currently: `riskScoringPersistence` and `riskScoringRoutesEnabled` * You will also need a Platinum or greater license. 1. Test that the task executes correctly 1. With the above data set up, navigate to `Settings -> Entity Risk Score` page, and enable the task by toggling `Entity risk scoring` to `On` 1. Within a few minutes, risk scores should be written to the risk score datastream: * `GET risk-score.risk-score-default/_search` * Replace `default` with the name of your current space, as necessary. 1. Disabling/re-enabling the risk engine should trigger another execution of the task (similar to disabling/enabling a DE rule) 1. Enable the risk engine in another space * The engine (and task) can be enabled/executed in any kibana space. * Because the engine only acts upon alerts in the current space, you will need to first ensure alerts exist in that space. 1. Validate the data/mappings of persisted risk scores * Scores are based on the Stage 1 [ECS RFC](elastic/ecs#2236) * There is no UI reading from these scores, currently (but that is introduced in #163237) ### Risk Matrix Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Multiple Spaces—unexpected behavior in non-default Kibana Space. | Low | High | Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. | | Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. | High | Low | Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. | | Code should gracefully handle cases when feature X or plugin Y are disabled. | Medium | High | Unit tests will verify that any feature flag or plugin combination still results in our service operational. | | [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: kibanamachine <[email protected]>

Risk Score Extensions, Stage 1

30453f0

rylnd requested a review from a team as a code owner July 18, 2023 23:07

Add Stage 1 link

9458719

rylnd commented Jul 19, 2023

View reviewed changes

rfcs/text/0042-risk-score-extensions.md Outdated Show resolved Hide resolved

rylnd commented Jul 19, 2023

View reviewed changes

rfcs/text/0042-risk-score-extensions.md Show resolved Hide resolved

rylnd commented Jul 19, 2023

View reviewed changes

rfcs/text/0042/risk.yml Outdated Show resolved Hide resolved

rylnd commented Jul 19, 2023

View reviewed changes

rfcs/text/0042/risk.yml Show resolved Hide resolved

rylnd mentioned this pull request Jul 24, 2023

Risk Score Persistence API elastic/kibana#161503

Merged

4 tasks

rylnd added 3 commits July 26, 2023 16:24

Rename riskiest_inputs -> risk_inputs

b80fbcf

More general naming to allow for modifying/extending the definition of these fields in the future.

Incorporate @SourinPaul's feedback for category field descriptions

a6df8ea

Merge branch 'main' into risk-score-stage-1

800a947

rylnd mentioned this pull request Jul 27, 2023

[RFC][Stage 0] Risk Input fields #2244

Merged

ebeahan reviewed Jul 27, 2023

View reviewed changes

rylnd added 6 commits July 27, 2023 14:49

Whoops

321d648

Add match_only_text multi-field to risk.inputs.description

9b9abb5

This will be useful if we have need to search on this field.

Nest inputs.* fields under risk.inputs

c4015bb

We have YAML here, let's actually leverage its features!

More helpful category field descriptions

44081e4

* Adds category-specific description to _count fields * Conslidates the general description of Risk Categories with the category-specific definition

Merge branch 'main' into risk-score-stage-1

c944982

ebeahan approved these changes Aug 4, 2023

View reviewed changes

Declare Sourin the RFC sponsor

2cfbccb

SourinPaul requested a review from lauravoicu August 8, 2023 17:38

ajosh0504 approved these changes Aug 8, 2023

View reviewed changes

rylnd added 2 commits August 9, 2023 21:24

Merge branch 'main' into risk-score-stage-1

4ec7b37

Merge branch 'main' into risk-score-stage-1

68e57af

rylnd merged commit a096d5c into elastic:main Aug 14, 2023

rylnd deleted the risk-score-stage-1 branch August 14, 2023 19:05

rylnd mentioned this pull request Aug 14, 2023

[Entity Analytics][Risk Engine] Risk Scoring Task elastic/kibana#163216

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Risk Score Extensions - Stage 1 #2236

[RFC] Risk Score Extensions - Stage 1 #2236

rylnd commented Jul 18, 2023

rylnd Jul 19, 2023

rylnd Jul 19, 2023

rylnd commented Jul 27, 2023

ebeahan left a comment

rylnd commented Aug 4, 2023

SourinPaul commented Aug 6, 2023

SourinPaul commented Aug 8, 2023

ajosh0504 left a comment

ajosh0504 Aug 8, 2023

susan-shu-c Aug 8, 2023

rylnd Aug 10, 2023



		### Risk Category Fields
		Some of the context here was discussed in Stage 0; please read the above for that. More specifically, these fields seek to provide the category contributions to the score, and the number of risk inputs in that category, across each of the five proposed categories:

[RFC] Risk Score Extensions - Stage 1 #2236

[RFC] Risk Score Extensions - Stage 1 #2236

Conversation

rylnd commented Jul 18, 2023

rylnd Jul 19, 2023

Choose a reason for hiding this comment

rylnd Jul 19, 2023

Choose a reason for hiding this comment

rylnd commented Jul 27, 2023

ebeahan left a comment

Choose a reason for hiding this comment

rylnd commented Aug 4, 2023

SourinPaul commented Aug 6, 2023

SourinPaul commented Aug 8, 2023

ajosh0504 left a comment

Choose a reason for hiding this comment

ajosh0504 Aug 8, 2023

Choose a reason for hiding this comment

susan-shu-c Aug 8, 2023

Choose a reason for hiding this comment

rylnd Aug 10, 2023

Choose a reason for hiding this comment