Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Risk Score Extensions - Stage 1 #2236

Merged
merged 14 commits into from
Aug 14, 2023
Merged

Conversation

rylnd
Copy link
Contributor

@rylnd rylnd commented Jul 18, 2023

Followup to #2232 .

@rylnd rylnd requested a review from a team as a code owner July 18, 2023 23:07


### Risk Category Fields
Some of the context here was discussed in Stage 0; please read the above for that. More specifically, these fields seek to provide the category contributions to the score, and the number of risk inputs in that category, across each of the five proposed categories:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other note (which I can add to the RFC, but let's discuss here): The main motivation for having separate fields here per category (as opposed to an object like in riskiest_inputs) is the ability to search/filter scores on these attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point being: if there's a better way to do this other than type: nested I'm all ears 😄

rfcs/text/0042/risk.yml Outdated Show resolved Hide resolved
rfcs/text/0042-risk-score-extensions.md Outdated Show resolved Hide resolved
rfcs/text/0042-risk-score-extensions.md Show resolved Hide resolved
rfcs/text/0042/risk.yml Outdated Show resolved Hide resolved
rfcs/text/0042/risk.yml Outdated Show resolved Hide resolved
rfcs/text/0042/risk.yml Outdated Show resolved Hide resolved
rfcs/text/0042/risk.yml Outdated Show resolved Hide resolved
rylnd added 6 commits July 27, 2023 14:49
This will be useful if we have need to search on this field.
With the context/nesting of these fields under the `risk` namespace,
most of the `risk_` prefixes are redundant.

Also shortens `identifier_*` to `id_*` as the verbosity does not help
with the distinction with the `id` field. Luckily, we have ECS
descriptions to disambiguate :)
We have YAML here, let's actually leverage its features!
* Adds category-specific description to _count fields
* Conslidates the general description of Risk Categories with the
  category-specific definition
@rylnd
Copy link
Contributor Author

rylnd commented Jul 27, 2023

@ebeahan I've updated the RFC with your feedback, thank you!

rylnd added a commit to elastic/kibana that referenced this pull request Jul 28, 2023
## Summary

* Introduces a new API, POST `/api/risk_scores/calculate`, that triggers
the code introduced here
* As with the [preview
route](#155966), this endpoint is
behind the `riskScoringRoutesEnabled` feature flag
* We intend to __REMOVE__ this endpoint before 8.10 release; it's mainly
a convenience/checkpoint for testing the existing code. The next PR will
introduce a scheduled Task Manager task that invokes this code
periodically.
* Updates to the /preview route:
* `data_view_id` is now a required parameter on both endpoints. If a
dataview is not found by that ID, the id is used as the general index
pattern to the query.
* Response has been updated to be more similar to the [ECS risk
fields](elastic/ecs#2236) powering this data.
* Mappings created by the [Data
Client](#158422) have been updated
to be aligned to the ECS risk fields (linked above)
* Adds/updates the [OpenAPI
spec](https://github.com/elastic/kibana/blob/main/x-pack/plugins/security_solution/server/lib/risk_engine/schema/risk_score_apis.yml)
for these endpoints; useful starting point if you're trying to get
oriented here.


## Things to review
* [PR Demo
environment](https://rylnd-pr-161503-risk-score-task-api.kbndev.co/app/home)
* Preview API and related UI still works as expected
* Calculation/Persistence API correctly bootstraps/persists data
    * correct mappings/ILM are created
    * things work in non-default spaces
   



### Checklist


- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios


### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces—unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes—Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |


### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
achyutjhunjhunwala pushed a commit to achyutjhunjhunwala/kibana that referenced this pull request Jul 28, 2023
## Summary

* Introduces a new API, POST `/api/risk_scores/calculate`, that triggers
the code introduced here
* As with the [preview
route](elastic#155966), this endpoint is
behind the `riskScoringRoutesEnabled` feature flag
* We intend to __REMOVE__ this endpoint before 8.10 release; it's mainly
a convenience/checkpoint for testing the existing code. The next PR will
introduce a scheduled Task Manager task that invokes this code
periodically.
* Updates to the /preview route:
* `data_view_id` is now a required parameter on both endpoints. If a
dataview is not found by that ID, the id is used as the general index
pattern to the query.
* Response has been updated to be more similar to the [ECS risk
fields](elastic/ecs#2236) powering this data.
* Mappings created by the [Data
Client](elastic#158422) have been updated
to be aligned to the ECS risk fields (linked above)
* Adds/updates the [OpenAPI
spec](https://github.com/elastic/kibana/blob/main/x-pack/plugins/security_solution/server/lib/risk_engine/schema/risk_score_apis.yml)
for these endpoints; useful starting point if you're trying to get
oriented here.


## Things to review
* [PR Demo
environment](https://rylnd-pr-161503-risk-score-task-api.kbndev.co/app/home)
* Preview API and related UI still works as expected
* Calculation/Persistence API correctly bootstraps/persists data
    * correct mappings/ILM are created
    * things work in non-default spaces
   



### Checklist


- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios


### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces—unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes—Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |


### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
ThomThomson pushed a commit to ThomThomson/kibana that referenced this pull request Aug 1, 2023
## Summary

* Introduces a new API, POST `/api/risk_scores/calculate`, that triggers
the code introduced here
* As with the [preview
route](elastic#155966), this endpoint is
behind the `riskScoringRoutesEnabled` feature flag
* We intend to __REMOVE__ this endpoint before 8.10 release; it's mainly
a convenience/checkpoint for testing the existing code. The next PR will
introduce a scheduled Task Manager task that invokes this code
periodically.
* Updates to the /preview route:
* `data_view_id` is now a required parameter on both endpoints. If a
dataview is not found by that ID, the id is used as the general index
pattern to the query.
* Response has been updated to be more similar to the [ECS risk
fields](elastic/ecs#2236) powering this data.
* Mappings created by the [Data
Client](elastic#158422) have been updated
to be aligned to the ECS risk fields (linked above)
* Adds/updates the [OpenAPI
spec](https://github.com/elastic/kibana/blob/main/x-pack/plugins/security_solution/server/lib/risk_engine/schema/risk_score_apis.yml)
for these endpoints; useful starting point if you're trying to get
oriented here.


## Things to review
* [PR Demo
environment](https://rylnd-pr-161503-risk-score-task-api.kbndev.co/app/home)
* Preview API and related UI still works as expected
* Calculation/Persistence API correctly bootstraps/persists data
    * correct mappings/ILM are created
    * things work in non-default spaces
   



### Checklist


- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios


### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces—unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes—Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |


### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
Copy link
Member

@ebeahan ebeahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I overlooked before there isn't a sponsor listed. Would that be @SourinPaul?

@rylnd
Copy link
Contributor Author

rylnd commented Aug 4, 2023

LGTM, but I overlooked before there isn't a sponsor listed. Would that be @SourinPaul?

I believe that's correct; I'll update the PR and consult him. I'm also going to try to solicit another round of feedback from EA folks; if there's additional changes @ebeahan I'll re-request your review. Thanks!

@SourinPaul
Copy link
Contributor

I believe that's correct;

Yes, please. Thanks @rylnd.

@SourinPaul SourinPaul requested a review from lauravoicu August 8, 2023 17:38
@SourinPaul
Copy link
Contributor

@lauravoicu would appreciate it if you could quickly review this PR and see if we missed anything. Thanks!

Copy link
Contributor

@ajosh0504 ajosh0504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great additions! 🚀

### Risk Category Fields
Some of the context here was discussed in Stage 0; please read the above for that. More specifically, these fields seek to provide the category contributions to the score, and the number of risk inputs in that category, across each of the five proposed categories:

* `category_1_score`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do users know what Category 1, 2...etc. mean? Would it make sense for these field names to reflect what the category means, for example alert_category_score, posture_category_count etc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that too, but it seems that similar to id_field and whatnot, ideally the content of each numbered category could be changed without changing the fields? e.g. category_1 could be from alerts one day, and changed to something else in the future 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! The more specific alert_category_score etc. were actually the original proposition, but we decided to go the more generic route for future-proofing.

@susan-shu-c got the gist of it, although more accurately we want to allow extending each category's definition as opposed to outright changing it (for backwards compatibility, etc.). For example, category_1 currently includes alerts, but might be extended in the future to allow more alert-like sources.

This all was described in a little more detail in stage 0 if you're interested 👍

@rylnd rylnd merged commit a096d5c into elastic:main Aug 14, 2023
@rylnd rylnd deleted the risk-score-stage-1 branch August 14, 2023 19:05
rylnd added a commit to elastic/kibana that referenced this pull request Aug 24, 2023
## What this PR does
* Adds a new Task Manager task, `risk_engine:risk_scoring`, responsible
for invoking the `calculateAndPersistRiskScores` API defined in the risk
scoring service.
* Unlike an alerting task, we do not encrypt/persist an API key for the
user. Instead, we use the internal kibana user to query all alerts in
the current space.
* The task configuration is stored as part of the existing
`risk-engine-configuration` Saved Object
* Extends the `risk-engine-configuration` SO to include more
configuration fields
* Management of this configuration is not currently exposed to the user.
They can only enable/disable the entire "Risk Engine" on the `Settings
-> Entity Risk Score` page
* The settings currently serve mainly as the "default" values for task
execution, but also as a way for a customer/SA to modify task execution
if necessary.
* We expect to be modifying these default values before release, as part
of our planned "tuning" stage.

### How to Review
* Setup:
* The risk engine acts on Detection engine alerts, and so you will need
to create:
      1. some "source" data (logs, filebeat, auditbeat, etc)
2. Rules looking for the above "source" data, and generating alerts
* The risk engine requires two feature flags, currently:
`riskScoringPersistence` and `riskScoringRoutesEnabled`
  * You will also need a Platinum or greater license.
1. Test that the task executes correctly
1. With the above data set up, navigate to `Settings -> Entity Risk
Score` page, and enable the task by toggling `Entity risk scoring` to
`On`
1. Within a few minutes, risk scores should be written to the risk score
datastream:
        * `GET risk-score.risk-score-default/_search`
* Replace `default` with the name of your current space, as necessary.
1. Disabling/re-enabling the risk engine should trigger another
execution of the task (similar to disabling/enabling a DE rule)
1. Enable the risk engine in another space
    * The engine (and task) can be enabled/executed in any kibana space.
* Because the engine only acts upon alerts in the current space, you
will need to first ensure alerts exist in that space.
1. Validate the data/mappings of persisted risk scores
* Scores are based on the Stage 1 [ECS
RFC](elastic/ecs#2236)
* There is no UI reading from these scores, currently (but that is
introduced in #163237)
  
  

### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces—unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes—Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |


### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants