-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SecuritySolution][EntityAnalytics] Risk Scoring Preview API #155966
Conversation
Adds a POST route hidden behind a feature flag. There's no actual logic in the route yet, just: * schemae for the request/response (first pass) * constants, other boilerplate necessary for the route * adds a feature flag to control the route
The API now calculates and returns scores if there's data. The happy path works but I can't guarantee much else. Most parameters are not respected.
Expresses the idea that the response can either be the simple id/index/score tuples generated by the query, or the full ECS alert/input documents depending on the parameters specified in the request.
The new parameter, enrich_inputs, allows the full documents to be returned in the response. This also simplifies the response to be a single array, as opposed to grouped by entity type as before. In order to prevent an otherwise unnecessary pass over the response data, we no longer convert the risk inputs' sort field into a top-level "score" field. We can add this back as a convenience, later.
I wrote this function and intended to extract inputs, call this function, and then enrich the original inputs with the results. I then realized that most of this could be done for us by specifying `_source: true` in the riskiest_inputs subaggregation. For now, this function is unused, but if in the future we need e.g. runtime field support for these docs, we might need to reevaluate/use this.
This is mostly a passthrough, not much to say here
Not the kibana user. Whoops!
* DRYs up some code that is (currently) identical
If specified along with the `filter` parameter, it will be appended as a range filter to the end of the other filters.
This is due to a hard cap on the number of iterations a painless script can perform. See elastic/elasticsearch#28946 for details.
For now, this will only include a single message if the number of inputs found exceeded the number that we can process. In the future, it will include detailed boosting explanations. I think.
It's not the best message for e.g. painless errors, but it's better than nothing.
Multiple filters are expressed as a boolean query object. * Removes "default" filters from route invocation, as an empty filter is not valid within elasticsearch DSL.
If that's not provided, or the dataView isn't found, we instead default to the space-aware alerts index. Previous to this commit, we were using alerts from all spaces.
I have no idea if this will affect performance, but we don't need it.
If 'host' or 'user' is provided, only identifiers of that type will be returned. If unspecified, both will be returned. If an unknown value is provided, no results are currently returned.
If specified, the API will return the full request/response being executed. This also nests the risk scores under a `scores` key in the response.
description: "Configuration used to tune risk scoring. Weights can be used to change the score contribution of risk inputs for hosts and users at both a global level and also for Risk Input categories (e.g. 'alerts')." | ||
type: object | ||
required: | ||
- type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to understand 😊:
When the weight property is not provided in the request, how is the score calculated?
Additionally, I noticed that when users include the weight property, they are required to specify a type property. By examining the schema, it appears that the type property can be any string. I'm curious about how we handle this type property. Is it unique for each calculation, or do users define it globally and reference it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In https://github.com/elastic/kibana/pull/155966/files#diff-13daee434883598231f7bf1e9acb58059d4ba751440ddcefa2c478453e9405c0R12, I observed that we define two types, namely RISK_CATEGORY_WEIGHT_TYPE and GLOBAL_IDENTIFIER_WEIGHT_TYPE. I'm uncertain about the behavior if a user were to provide a different value for the type property. Should we include validation to handle such cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the weight property is not provided in the request, how is the score calculated?
If a weight is not specified, it's assumed to be 1. You can see that behavior in the tests for buildWeightingOfScoreByCategory
. In the case of a no global weighting, we just omit it, since multiplying by 1 is a no-op.
Should we include validation to handle such cases?
I've written some custom io-ts for most of these parameters, now; please give this another pass to see if there are any outstanding questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done 👏🏻 thank you!!
RiskScoresRequest: | ||
type: object | ||
properties: | ||
after_keys: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some investigation 😅, I noticed that the after_key
is being used for pagination in the Elasticsearch aggregation query. This led me to question whether this API is intended for internal use only and primarily targeted at users who are knowledgeable about Elasticsearch queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I don't expect users to be consuming this API themselves (although they can). The pagination will be handled by the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: after more discussion I've decided:
- This endpoint is going to remain internal for now, but
- In developing the public API I think it makes sense to restrict users to requesting a single type of entity, which in turn will make the pagination much simpler. I'll probably update this endpoint in the same way unless the UI ends up needing to make a single request for some reason.
- Once the above is done, we can decide whether this endpoint should be made public, or now.
This adds more realistic validations and accompanying tests to explain the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
These enums were moved to a more central location.
* Updates functions and helpers to accept these generated types instead of our static ones * Updates some helper methods to be type predicates * Exports more generated types
This defines identifierWeights slightly differently to allow arbitrary access by identifierType, e.g. `weight[identifierType]`. This change has no effect on the validation (as evidenced by the unit tests), but causes the generated type to effectively say "you can always access host and user keys, although they may be undefined"
* Renames from category_weights to risk_weights to be more general and more consistent with common/risk_engine * Moves a few helpers specific to after_keys and identifier_types into helpers.ts
As opposed to defaulting to a different data source, which would be unexpected.
enumName: string, | ||
theEnum: Record<string, EnumType> | ||
): t.Type<EnumType, EnumType, unknown> { | ||
const isEnumValue = (input: unknown): input is EnumType => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks Ryland!!
Appreciate addressing the comments, organising the commits, and the overallPR structure.
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Module Count
Public APIs missing comments
Async chunks
Page load bundle
Unknown metric groupsAPI count
ESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @rylnd |
## Summary * Introduces a new API, POST `/api/risk_scores/calculate`, that triggers the code introduced here * As with the [preview route](#155966), this endpoint is behind the `riskScoringRoutesEnabled` feature flag * We intend to __REMOVE__ this endpoint before 8.10 release; it's mainly a convenience/checkpoint for testing the existing code. The next PR will introduce a scheduled Task Manager task that invokes this code periodically. * Updates to the /preview route: * `data_view_id` is now a required parameter on both endpoints. If a dataview is not found by that ID, the id is used as the general index pattern to the query. * Response has been updated to be more similar to the [ECS risk fields](elastic/ecs#2236) powering this data. * Mappings created by the [Data Client](#158422) have been updated to be aligned to the ECS risk fields (linked above) * Adds/updates the [OpenAPI spec](https://github.com/elastic/kibana/blob/main/x-pack/plugins/security_solution/server/lib/risk_engine/schema/risk_score_apis.yml) for these endpoints; useful starting point if you're trying to get oriented here. ## Things to review * [PR Demo environment](https://rylnd-pr-161503-risk-score-task-api.kbndev.co/app/home) * Preview API and related UI still works as expected * Calculation/Persistence API correctly bootstraps/persists data * correct mappings/ILM are created * things work in non-default spaces ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios ### Risk Matrix Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Multiple Spaces—unexpected behavior in non-default Kibana Space. | Low | High | Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. | | Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. | High | Low | Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. | | Code should gracefully handle cases when feature X or plugin Y are disabled. | Medium | High | Unit tests will verify that any feature flag or plugin combination still results in our service operational. | | [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
## Summary * Introduces a new API, POST `/api/risk_scores/calculate`, that triggers the code introduced here * As with the [preview route](elastic#155966), this endpoint is behind the `riskScoringRoutesEnabled` feature flag * We intend to __REMOVE__ this endpoint before 8.10 release; it's mainly a convenience/checkpoint for testing the existing code. The next PR will introduce a scheduled Task Manager task that invokes this code periodically. * Updates to the /preview route: * `data_view_id` is now a required parameter on both endpoints. If a dataview is not found by that ID, the id is used as the general index pattern to the query. * Response has been updated to be more similar to the [ECS risk fields](elastic/ecs#2236) powering this data. * Mappings created by the [Data Client](elastic#158422) have been updated to be aligned to the ECS risk fields (linked above) * Adds/updates the [OpenAPI spec](https://github.com/elastic/kibana/blob/main/x-pack/plugins/security_solution/server/lib/risk_engine/schema/risk_score_apis.yml) for these endpoints; useful starting point if you're trying to get oriented here. ## Things to review * [PR Demo environment](https://rylnd-pr-161503-risk-score-task-api.kbndev.co/app/home) * Preview API and related UI still works as expected * Calculation/Persistence API correctly bootstraps/persists data * correct mappings/ILM are created * things work in non-default spaces ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios ### Risk Matrix Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Multiple Spaces—unexpected behavior in non-default Kibana Space. | Low | High | Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. | | Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. | High | Low | Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. | | Code should gracefully handle cases when feature X or plugin Y are disabled. | Medium | High | Unit tests will verify that any feature flag or plugin combination still results in our service operational. | | [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
## Summary * Introduces a new API, POST `/api/risk_scores/calculate`, that triggers the code introduced here * As with the [preview route](elastic#155966), this endpoint is behind the `riskScoringRoutesEnabled` feature flag * We intend to __REMOVE__ this endpoint before 8.10 release; it's mainly a convenience/checkpoint for testing the existing code. The next PR will introduce a scheduled Task Manager task that invokes this code periodically. * Updates to the /preview route: * `data_view_id` is now a required parameter on both endpoints. If a dataview is not found by that ID, the id is used as the general index pattern to the query. * Response has been updated to be more similar to the [ECS risk fields](elastic/ecs#2236) powering this data. * Mappings created by the [Data Client](elastic#158422) have been updated to be aligned to the ECS risk fields (linked above) * Adds/updates the [OpenAPI spec](https://github.com/elastic/kibana/blob/main/x-pack/plugins/security_solution/server/lib/risk_engine/schema/risk_score_apis.yml) for these endpoints; useful starting point if you're trying to get oriented here. ## Things to review * [PR Demo environment](https://rylnd-pr-161503-risk-score-task-api.kbndev.co/app/home) * Preview API and related UI still works as expected * Calculation/Persistence API correctly bootstraps/persists data * correct mappings/ILM are created * things work in non-default spaces ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios ### Risk Matrix Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Multiple Spaces—unexpected behavior in non-default Kibana Space. | Low | High | Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. | | Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. | High | Low | Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. | | Code should gracefully handle cases when feature X or plugin Y are disabled. | Medium | High | Unit tests will verify that any feature flag or plugin combination still results in our service operational. | | [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
Summary
This PR adds a new Risk Scoring API endpoint. Its functionality is meant to replace the current transform-based solution.
Contents of this PR:
riskScoringRoutesEnabled
/internal/risk_scores/preview
Current behavior, and short-term plans
The endpoint as specified in this branch is read-only. When the endpoint is hit, it triggers some aggregations in elasticsearch, and a formatted response is returned; there is no persistence at this time. This endpoint was originally written as a POC to demonstrate the new Risk Engine's functionality, but it will now drive the Preview Risk Scoring feature.
The main path for the Risk Engine is going to be a scheduled task that calculates Risk Scores and writes them to a persistent datastream that we own. (https://github.com/elastic/security-team/issues/6450). To accomplish this, we will decompose the full functionality of this endpoint into constituent pieces (i.e.
calculate | persist, get
)How to review
I've created a Postman collection that can be used to exercise this endpoint. It was generated by Postman from the OpenAPI spec, and modified by me to contain a valid subset of request parameters; please peruse the spec and/or feel free to generate your own scripts/tools from the spec.
Review against the PR instance
I've created a demo instance containing the code on this branch, along with some realistic(ish) alert data (~200k alerts). While you can use this instance as a convenience, you will need to set up kibana-remote-dev and forward ports in order to be able to access the instance's API from a local machine:
GITHUB_USERNAME=rylnd
./ports rd-rylnd-pr-155966-risk-score-api
http://localhost:5601
, which will be forwarded to the cloud instance via the previous commandReview manually
host.name
s anduser.name
s from those alerts have been aggregated into these "risk scores" in the responseWhat to review
after_keys
stuff)?TODO (for @rylnd)
description
s to the OpenAPI docsChecklist
For maintainers
Related ticket: https://github.com/elastic/security-team/issues/4211