[ResponseOps][Alerting] Decouple feature IDs from consumers #183756

cnasikas · 2024-05-17T15:32:12Z

Summary

This PR aims to decouple the feature IDs from the consumer attribute of rules and alerts.

Towards: #187202
Fixes: #181559
Fixes: #182435

Note

Unfortunately, I could not break the PR into smaller pieces. The APIs could not work anymore with feature IDs and had to convert them to use rule type IDs. Also, I took the chance and refactored crucial parts of the authorization class that in turn affected a lot of files. Most of the changes in the files are minimal and easy to review. The crucial changes are in the authorization class and some alerting APIs.

Architecture

Alerting RBAC model

The Kibana security uses Elasticsearch's application privileges. This way Kibana can represent and store its privilege models within Elasticsearch roles. To do that, Kibana security creates actions that are granted by a specific privilege. Alerting uses its own RBAC model and is built on top of the existing Kibana security model. The Alerting RBAC uses the rule_type_id and consumer attributes to define who owns the rule and the alerts procured by the rule. To connect the rule_type_id and consumer with the Kibana security actions the Alerting RBAC registers its custom actions. They are constructed as alerting:<rule-type-id>/<feature-id>/<alerting-entity>/<operation>. Because to authorizate a resource an action has to be generated and because the action needs a valid feature ID the value of the consumer should be a valid feature ID. For example, the alerting:siem.esqlRule/siem/rule/get action, means that a user with a role that grants this action can get a rule of type siem.esqlRule with consumer siem.

Problem statement

At the moment the consumer attribute should be a valid feature ID. Though this approach worked well so far it has its limitation. Specifically:

Rule types cannot support more than one consumer.
To associate old rules with a new feature ID required a migration on the rule's SOs and the alerts documents.
The API calls are feature ID-oriented and not rule-type-oriented.
The framework has to be aware of the values of the consumer attribute.
Feature IDs are tightly coupled with the alerting indices leading to bugs.
Legacy consumers that are not a valid feature anymore can cause bugs.
The framework has to be aware of legacy consumers to handle edge cases.
The framework has to be aware of specific consumers to handle edge cases.

Proposed solution

This PR aims to decouple the feature IDs from consumers. It achieves that a) by changing the way solutions configure the alerting privileges when registering a feature and b) by changing the alerting actions. The schema changes as:

// Old formatting
id: 'siem', <--- feature ID
alerting:['siem.queryRule']

// New formatting
id: 'siem', <--- feature ID
alerting: [{ ruleTypeId: 'siem.queryRule', consumers: ['siem'] }] <-- consumer same as the feature ID in the old formatting

The new actions are constructed as alerting:<rule-type-id>/<consumer>/<alerting-entity>/<operation>. For example alerting:rule-type-id/my-consumer/rule/get. The new action means that a user with a role that grants this action can get a rule of type rule-type with consumer my-consumer. Changing the action strings is not considered a breaking change as long as the user's permission works as before. In our case, this is true because the consumer will be the same as before (feature ID), and the alerting security actions will be the same. For example:

Old formatting

Schema:

id: 'logs', <--- feature ID
alerting:['.es-query'] <-- rule type ID

Generated action:

alerting:.es-query/logs/rule/get

New formatting

Schema:

id: 'siem', <--- feature ID
alerting: [{ ruleTypeId: '.es-query', consumers: ['logs'] }] <-- consumer same as the feature ID in the old formatting

Generated action:

alerting:.es-query/logs/rule/get <--- consumer is set as logs and the action is the same as before

In both formating the actions are the same thus breaking changes are avoided.

Alerting authorization class

The alerting plugin uses and exports the alerting authorization class (AlertingAuthorization). The class is responsible for handling all authorization actions related to rules and alerts. The class changed to handle the new actions as described in the above sections. A lot of methods were renamed, removed, and cleaned up, all method arguments converted to be an object, and the response signature of some methods changed. These changes affected various pieces of the code. The changes in this class are the most important in this PR especially the _getAuthorizedRuleTypesWithAuthorizedConsumers method which is the cornerstone of the alerting RBAC. Please review carefully.

Instantiation of the alerting authorization class

The AlertingAuthorizationClientFactory is used to create instances of the AlertingAuthorization class. The AlertingAuthorization class needs to perform async operations upon instantiation. Because JS, at the moment, does not support async instantiation of classes the AlertingAuthorization class was assigning Promise objects to variables that could be resolved later in other phases of the lifecycle of the class. To improve readability and make the lifecycle of the class clearer, I separated the construction of the class (initialization) from the bootstrap process. As a result, getting the AlertingAuthorization class or any client that depends on it (getRulesClient for example) is an async operation.

Filtering

A lot of routes use the authorization class to get the authorization filter (getFindAuthorizationFilter), a filter that, if applied, returns only the rule types and consumers the user is authorized to. The method that returns the filter was built in a way to also support filtering on top of the authorization filter thus coupling the authorized filter with router filtering. I believe these two operations should be decoupled and the filter method should return a filter that gives you all the authorized rule types. It is the responsibility of the consumer, router in our case, to apply extra filters on top of the authorization filter. For that reason, I made all the necessary changes to decouple them.

Legacy consumers & producer

A lot of rules and alerts have been created and are still being created from observability with the alerts consumer. When the Alerting RBAC encounters a rule or alert with alerts as a consumer it falls back to the producer of the rule type ID to construct the actions. For example if a rule with ruleTypeId: .es-query and consumer: alerts the alerting action will be constructed as alerting:.es-query/stackAlerts/rule/get where stackRules is the producer of the .es-query rule type. The producer is used to be used in alerting authorization but due to its complexity, it was deprecated and only used as a fallback for the alerts consumer. To avoid breaking changes all feature privileges that specify access to rule types add the alerts consumer when configuring their alerting privileges. By moving the alerts consumer to the registration of the feature we can stop relying on the producer. The producer is not used anymore in the authorization class. In the next PRs the producer will removed entirely.

Routes

The following changes were introduced to the alerting routes:

All related routes changed to be rule-type oriented and not feature ID oriented.
All related routes support the ruleTypeIds and the consumers parameters for filtering. In all routes, the filters are constructed as ruleTypeIds: ['foo'] AND consumers: ['bar'] AND authorizationFilter. Filtering by consumers is important. In o11y for example, we do not want to show ES rule types with the stackAlerts consumer even if the user has access to them.
The /internal/rac/alerts/_feature_ids route got deleted as it was not used anywhere in the codebase and it was internal.

All the changes in the routes are related to internal routes and no breaking changes are introduced.

Constants

I moved the o11y and stack rule type IDs to kbn-rule-data-utils and exported all security solution rule type IDs from kbn-securitysolution-rules. I am not a fan of having a centralized place for the rule type IDs. Ideally, consumers of the framework should specify keywords like observablility (category or subcategory) or even apm.* and the framework should know which rule type IDs to pick up. I think it is out of the scope of the PR, and at the moment it seems the most straightforward way to move forward. I will try to clean up as much as possible in further iterations. If you are interested in the upcoming work follow this issue #187202.

Other notable code changes

Change all instances of feature IDs to rule type IDs.
isSiemRuleType: This is a temporary helper function that is needed in places where we handle edge cases related to security solution rule types. Ideally, the framework should be agnostic to the rule types or consumers. The plan is to be removed entirely in further iterations.
Rename alerting PluginSetupContract and PluginStartContract to AlertingServerSetup and AlertingServerStart. This made me touch a lot of files but I could not resist.
filter_consumers was mistakenly exposed to a public API. It was undocumented.
Files or functions that were not used anywhere in the codebase got deleted.
Change the returned type of the list method of the RuleTypeRegistry from Set<RegistryRuleType> to Map<string, RegistryRuleType>.
Assertion of KueryNode in tests changed to an assertion of KQL using toKqlExpression.
Removal of useRuleAADFields as it is not used anywhere.

Testing

Caution

It is very important to test all the areas of the application where rules or alerts are being used directly or indirectly. Scenarios to consider:

The correct rules, alerts, and aggregations on top of them are being shown as expected as a superuser.
The correct rules, alerts, and aggregations on top of them are being shown as expected by a user with limited access to certain features.
The changes in this PR are backward compatible with the previous users' permissions.

Solutions

Please test and verify that:

All the rule types you own with all possible combinations of permissions both in ESS and in Serverless.
The consumers and rule types make sense when registering the features.
The consumers and rule types that are passed to the components are the intended ones.

ResponseOps

The most important changes are in the alerting authorization class, the search strategy, and the routes. Please test:

The rules we own with all possible combinations of permissions.
The stack alerts page and its solution filtering.
The categories filtering in the maintenance window UI.

Risks

Warning

The risks involved in this PR are related to privileges. Specifically:

Users with no privileges can access rules and alerts they do not have access to.
Users with privileges cannot access rules and alerts they have access to.

An excessive list of integration tests is in place to ensure that the above scenarios will not occur. In the case of a bug, we could a) release an energy release for serverless and b) backport the fix in ESS. Given that this PR is intended to be merged in 8.17 we have plenty of time to test and to minimize the chances of risks.

FQA

I noticed that a lot of routes support the filter parameter where we can pass an arbitrary KQL filter. Why we do not use this to filter by the rule type IDs and the consumers and instead we introduce new dedicated parameters?

The filter parameter should not be exposed in the first place. It assumes that the consumer of the API knows the underlying structure and implementation details of the persisted storage API (SavedObject client API). For example, a valid filter would be alerting.attributes.rule_type_id. In this filter the consumer should know a) the name of the SO b) the keyword attributes (storage implementation detail) and c) the name of the attribute as it is persisted in ES (snake case instead of camel case as it is returned by the APIs). As there is no abstraction layer between the SO and the API, it makes it very difficult to make changes in the persistent schema or the APIs. For all the above I decided to introduce new query parameters where the alerting framework has total control over it.

I noticed in the code a lot of instances where the consumer is used. Should not remove any logic around consumers?

This PR is a step forward making the framework as agnostic as possible. I had to keep the scope of the PR as contained as possible. We will get there. It needs time :).

I noticed a lot of hacks like checking if the rule type is siem. Should not remove the hacks?

This PR is a step forward making the framework as agnostic as possible. I had to keep the scope of the PR as contained as possible. We will get there. It needs time :).

I hate the "Role visibility" dropdown. Can we remove it?

I also do not like it. The goal is to remove it. Follow #189997.

…ruleTypeIds

cnasikas · 2024-05-20T13:08:01Z

/ci

consulthys

LGT Stack Monitoring

cauemarcondes · 2024-12-02T09:20:30Z

x-pack/plugins/observability_solution/apm/common/alerting/config/apm_alerting_feature_ids.ts

 ];
+
+export const apmAlertingRuleTypeIds: string[] = [...OBSERVABILITY_RULE_TYPE_IDS];


Suggested change

export const apmAlertingRuleTypeIds: string[] = [...OBSERVABILITY_RULE_TYPE_IDS];

export const apmAlertingRuleTypeIds: string[] = OBSERVABILITY_RULE_TYPE_IDS;

The OBSERVABILITY_RULE_TYPE_IDS is used in a lot of places and I wanted to avoid someone mutating the original array. const protect us with assignments but it does not with push.

...lity_solution/infra/public/pages/metrics/hosts/components/tabs/alerts/alerts_tab_content.tsx

...observability_solution/infra/public/pages/metrics/hosts/components/tabs/alerts_tab_badge.tsx

cauemarcondes

LGTM

ashokaditya

Did code review only and LGTM. Thanks for taking on the massive changeset. 🙇🏼

elasticmachine · 2024-12-03T08:14:51Z

💚 Build Succeeded

Buildkite Build
Commit: a0c2dab
Storybooks Preview
Cloud Deployment
Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-183756-a0c2dabc3c1c

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`alerting`	248	247	-1
`cases`	823	827	+4
`observability`	1074	1073	-1
`securitySolution`	6310	6308	-2
`triggersActionsUi`	862	860	-2
total			-2

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`@kbn/alerts-ui-shared`	305	295	-10
`@kbn/rule-data-utils`	135	140	+5
`@kbn/securitysolution-rules`	25	26	+1
`alerting`	848	839	-9
`ml`	63	64	+1
`ruleRegistry`	226	222	-4
`uptime`	1	0	-1
total			-17

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`alerting`	94.1KB	91.4KB	-2.6KB
`apm`	3.4MB	3.4MB	+84.0B
`cases`	492.7KB	492.8KB	+80.0B
`discover`	812.9KB	813.3KB	+400.0B
`infra`	1.8MB	1.8MB	-358.0B
`ml`	4.7MB	4.7MB	-217.0B
`observability`	472.8KB	472.0KB	-830.0B
`observabilityLogsExplorer`	147.3KB	147.7KB	+358.0B
`securitySolution`	14.6MB	14.6MB	-68.0B
`slo`	848.0KB	848.3KB	+275.0B
`synthetics`	1.1MB	1.1MB	+404.0B
`triggersActionsUi`	1.7MB	1.7MB	-359.0B
`uptime`	467.5KB	467.5KB	+4.0B
total			-2.8KB

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id	before	after	diff
`features`	2	3	+1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`apm`	38.1KB	38.6KB	+531.0B
`cases`	159.5KB	159.6KB	+146.0B
`infra`	56.9KB	58.1KB	+1.2KB
`ml`	75.8KB	76.5KB	+681.0B
`observability`	103.8KB	104.5KB	+766.0B
`observabilityShared`	93.7KB	93.7KB	+36.0B
`slo`	28.8KB	29.2KB	+394.0B
`synthetics`	37.6KB	37.8KB	+150.0B
`triggersActionsUi`	128.9KB	128.5KB	-363.0B
`uptime`	22.9KB	22.9KB	+1.0B
total			+3.5KB

Unknown metric groups

API count

id	before	after	diff
`@kbn/alerts-grouping`	31	32	+1
`@kbn/alerts-ui-shared`	321	311	-10
`@kbn/rule-data-utils`	138	152	+14
`@kbn/securitysolution-rules`	28	29	+1
`alerting`	880	872	-8
`ml`	148	149	+1
`ruleRegistry`	263	259	-4
total			-5

ESLint disabled line counts

id	before	after	diff
`@kbn/test-suites-xpack`	722	721	-1

Total ESLint disabled count

id	before	after	diff
`@kbn/test-suites-xpack`	747	746	-1

History

💛 Build #256038 was flaky e7afb8e
💔 Build #255861 failed 7758052
💔 Build #255822 failed 9432aa3
💛 Build #254445 was flaky 1ed4c79

cc @cnasikas

kibanamachine · 2024-12-03T10:22:13Z

Starting backport for target branches: 8.x

https://github.com/elastic/kibana/actions/runs/12137618585

kibanamachine · 2024-12-03T10:27:19Z

💔 All backports failed

Status	Branch	Result
❌	8.x	Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 183756

Questions ?

Please refer to the Backport tool documentation

cnasikas · 2024-12-03T10:55:56Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

…83756) (#202667) # Backport This will backport the following commits from `main` to `8.x`: - [[ResponseOps][Alerting] Decouple feature IDs from consumers (#183756)](#183756)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)

…183756) ## Summary This PR aims to decouple the feature IDs from the `consumer` attribute of rules and alerts. Towards: elastic#187202 Fixes: elastic#181559 Fixes: elastic#182435 > [!NOTE] > Unfortunately, I could not break the PR into smaller pieces. The APIs could not work anymore with feature IDs and had to convert them to use rule type IDs. Also, I took the chance and refactored crucial parts of the authorization class that in turn affected a lot of files. Most of the changes in the files are minimal and easy to review. The crucial changes are in the authorization class and some alerting APIs. ## Architecture ### Alerting RBAC model The Kibana security uses Elasticsearch's [application privileges](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-privileges.html#security-api-put-privileges). This way Kibana can represent and store its privilege models within Elasticsearch roles. To do that, Kibana security creates actions that are granted by a specific privilege. Alerting uses its own RBAC model and is built on top of the existing Kibana security model. The Alerting RBAC uses the `rule_type_id` and `consumer` attributes to define who owns the rule and the alerts procured by the rule. To connect the `rule_type_id` and `consumer` with the Kibana security actions the Alerting RBAC registers its custom actions. They are constructed as `alerting:<rule-type-id>/<feature-id>/<alerting-entity>/<operation>`. Because to authorizate a resource an action has to be generated and because the action needs a valid feature ID the value of the `consumer` should be a valid feature ID. For example, the `alerting:siem.esqlRule/siem/rule/get` action, means that a user with a role that grants this action can get a rule of type `siem.esqlRule` with consumer `siem`. ### Problem statement At the moment the `consumer` attribute should be a valid feature ID. Though this approach worked well so far it has its limitation. Specifically: - Rule types cannot support more than one consumer. - To associate old rules with a new feature ID required a migration on the rule's SOs and the alerts documents. - The API calls are feature ID-oriented and not rule-type-oriented. - The framework has to be aware of the values of the `consumer` attribute. - Feature IDs are tightly coupled with the alerting indices leading to [bugs](elastic#179082). - Legacy consumers that are not a valid feature anymore can cause [bugs](elastic#184595). - The framework has to be aware of legacy consumers to handle edge cases. - The framework has to be aware of specific consumers to handle edge cases. ### Proposed solution This PR aims to decouple the feature IDs from consumers. It achieves that a) by changing the way solutions configure the alerting privileges when registering a feature and b) by changing the alerting actions. The schema changes as: ``` // Old formatting id: 'siem', <--- feature ID alerting:['siem.queryRule'] // New formatting id: 'siem', <--- feature ID alerting: [{ ruleTypeId: 'siem.queryRule', consumers: ['siem'] }] <-- consumer same as the feature ID in the old formatting ``` The new actions are constructed as `alerting:<rule-type-id>/<consumer>/<alerting-entity>/<operation>`. For example `alerting:rule-type-id/my-consumer/rule/get`. The new action means that a user with a role that grants this action can get a rule of type `rule-type` with consumer `my-consumer`. Changing the action strings is not considered a breaking change as long as the user's permission works as before. In our case, this is true because the consumer will be the same as before (feature ID), and the alerting security actions will be the same. For example: **Old formatting** Schema: ``` id: 'logs', <--- feature ID alerting:['.es-query'] <-- rule type ID ``` Generated action: ``` alerting:.es-query/logs/rule/get ``` **New formatting** Schema: ``` id: 'siem', <--- feature ID alerting: [{ ruleTypeId: '.es-query', consumers: ['logs'] }] <-- consumer same as the feature ID in the old formatting ``` Generated action: ``` alerting:.es-query/logs/rule/get <--- consumer is set as logs and the action is the same as before ``` In both formating the actions are the same thus breaking changes are avoided. ### Alerting authorization class The alerting plugin uses and exports the alerting authorization class (`AlertingAuthorization`). The class is responsible for handling all authorization actions related to rules and alerts. The class changed to handle the new actions as described in the above sections. A lot of methods were renamed, removed, and cleaned up, all method arguments converted to be an object, and the response signature of some methods changed. These changes affected various pieces of the code. The changes in this class are the most important in this PR especially the `_getAuthorizedRuleTypesWithAuthorizedConsumers` method which is the cornerstone of the alerting RBAC. Please review carefully. ### Instantiation of the alerting authorization class The `AlertingAuthorizationClientFactory` is used to create instances of the `AlertingAuthorization` class. The `AlertingAuthorization` class needs to perform async operations upon instantiation. Because JS, at the moment, does not support async instantiation of classes the `AlertingAuthorization` class was assigning `Promise` objects to variables that could be resolved later in other phases of the lifecycle of the class. To improve readability and make the lifecycle of the class clearer, I separated the construction of the class (initialization) from the bootstrap process. As a result, getting the `AlertingAuthorization` class or any client that depends on it (`getRulesClient` for example) is an async operation. ### Filtering A lot of routes use the authorization class to get the authorization filter (`getFindAuthorizationFilter`), a filter that, if applied, returns only the rule types and consumers the user is authorized to. The method that returns the filter was built in a way to also support filtering on top of the authorization filter thus coupling the authorized filter with router filtering. I believe these two operations should be decoupled and the filter method should return a filter that gives you all the authorized rule types. It is the responsibility of the consumer, router in our case, to apply extra filters on top of the authorization filter. For that reason, I made all the necessary changes to decouple them. ### Legacy consumers & producer A lot of rules and alerts have been created and are still being created from observability with the `alerts` consumer. When the Alerting RBAC encounters a rule or alert with `alerts` as a consumer it falls back to the `producer` of the rule type ID to construct the actions. For example if a rule with `ruleTypeId: .es-query` and `consumer: alerts` the alerting action will be constructed as `alerting:.es-query/stackAlerts/rule/get` where `stackRules` is the producer of the `.es-query` rule type. The `producer` is used to be used in alerting authorization but due to its complexity, it was deprecated and only used as a fallback for the `alerts` consumer. To avoid breaking changes all feature privileges that specify access to rule types add the `alerts` consumer when configuring their alerting privileges. By moving the `alerts` consumer to the registration of the feature we can stop relying on the `producer`. The `producer` is not used anymore in the authorization class. In the next PRs the `producer` will removed entirely. ### Routes The following changes were introduced to the alerting routes: - All related routes changed to be rule-type oriented and not feature ID oriented. - All related routes support the `ruleTypeIds` and the `consumers` parameters for filtering. In all routes, the filters are constructed as `ruleTypeIds: ['foo'] AND consumers: ['bar'] AND authorizationFilter`. Filtering by consumers is important. In o11y for example, we do not want to show ES rule types with the `stackAlerts` consumer even if the user has access to them. - The `/internal/rac/alerts/_feature_ids` route got deleted as it was not used anywhere in the codebase and it was internal. All the changes in the routes are related to internal routes and no breaking changes are introduced. ### Constants I moved the o11y and stack rule type IDs to `kbn-rule-data-utils` and exported all security solution rule type IDs from `kbn-securitysolution-rules`. I am not a fan of having a centralized place for the rule type IDs. Ideally, consumers of the framework should specify keywords like `observablility` (category or subcategory) or even `apm.*` and the framework should know which rule type IDs to pick up. I think it is out of the scope of the PR, and at the moment it seems the most straightforward way to move forward. I will try to clean up as much as possible in further iterations. If you are interested in the upcoming work follow this issue elastic#187202. ### Other notable code changes - Change all instances of feature IDs to rule type IDs. - `isSiemRuleType`: This is a temporary helper function that is needed in places where we handle edge cases related to security solution rule types. Ideally, the framework should be agnostic to the rule types or consumers. The plan is to be removed entirely in further iterations. - Rename alerting `PluginSetupContract` and `PluginStartContract` to `AlertingServerSetup` and `AlertingServerStart`. This made me touch a lot of files but I could not resist. - `filter_consumers` was mistakenly exposed to a public API. It was undocumented. - Files or functions that were not used anywhere in the codebase got deleted. - Change the returned type of the `list` method of the `RuleTypeRegistry` from `Set<RegistryRuleType>` to `Map<string, RegistryRuleType>`. - Assertion of `KueryNode` in tests changed to an assertion of KQL using `toKqlExpression`. - Removal of `useRuleAADFields` as it is not used anywhere. ## Testing > [!CAUTION] > It is very important to test all the areas of the application where rules or alerts are being used directly or indirectly. Scenarios to consider: > - The correct rules, alerts, and aggregations on top of them are being shown as expected as a superuser. > - The correct rules, alerts, and aggregations on top of them are being shown as expected by a user with limited access to certain features. > - The changes in this PR are backward compatible with the previous users' permissions. ### Solutions Please test and verify that: - All the rule types you own with all possible combinations of permissions both in ESS and in Serverless. - The consumers and rule types make sense when registering the features. - The consumers and rule types that are passed to the components are the intended ones. ### ResponseOps The most important changes are in the alerting authorization class, the search strategy, and the routes. Please test: - The rules we own with all possible combinations of permissions. - The stack alerts page and its solution filtering. - The categories filtering in the maintenance window UI. ## Risks > [!WARNING] > The risks involved in this PR are related to privileges. Specifically: > - Users with no privileges can access rules and alerts they do not have access to. > - Users with privileges cannot access rules and alerts they have access to. > > An excessive list of integration tests is in place to ensure that the above scenarios will not occur. In the case of a bug, we could a) release an energy release for serverless and b) backport the fix in ESS. Given that this PR is intended to be merged in 8.17 we have plenty of time to test and to minimize the chances of risks. ## FQA - I noticed that a lot of routes support the `filter` parameter where we can pass an arbitrary KQL filter. Why we do not use this to filter by the rule type IDs and the consumers and instead we introduce new dedicated parameters? The `filter` parameter should not be exposed in the first place. It assumes that the consumer of the API knows the underlying structure and implementation details of the persisted storage API (SavedObject client API). For example, a valid filter would be `alerting.attributes.rule_type_id`. In this filter the consumer should know a) the name of the SO b) the keyword `attributes` (storage implementation detail) and c) the name of the attribute as it is persisted in ES (snake case instead of camel case as it is returned by the APIs). As there is no abstraction layer between the SO and the API, it makes it very difficult to make changes in the persistent schema or the APIs. For all the above I decided to introduce new query parameters where the alerting framework has total control over it. - I noticed in the code a lot of instances where the consumer is used. Should not remove any logic around consumers? This PR is a step forward making the framework as agnostic as possible. I had to keep the scope of the PR as contained as possible. We will get there. It needs time :). - I noticed a lot of hacks like checking if the rule type is `siem`. Should not remove the hacks? This PR is a step forward making the framework as agnostic as possible. I had to keep the scope of the PR as contained as possible. We will get there. It needs time :). - I hate the "Role visibility" dropdown. Can we remove it? I also do not like it. The goal is to remove it. Follow elastic#189997. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Aleh Zasypkin <[email protected]> Co-authored-by: Paula Borgonovi <[email protected]>

## Summary ### Problem With the merge of the PR #183756, the alerts function has stopped working in the Obs AI Assistant, because there has been a change to the query (when finding alerts) ### Solution Revert the change made to the query. ### Checklist - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

## Summary ### Problem With the merge of the PR elastic#183756, the alerts function has stopped working in the Obs AI Assistant, because there has been a change to the query (when finding alerts) ### Solution Revert the change made to the query. ### Checklist - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) (cherry picked from commit d9c1cd3)

…183756) ## Summary This PR aims to decouple the feature IDs from the `consumer` attribute of rules and alerts. Towards: elastic#187202 Fixes: elastic#181559 Fixes: elastic#182435 > [!NOTE] > Unfortunately, I could not break the PR into smaller pieces. The APIs could not work anymore with feature IDs and had to convert them to use rule type IDs. Also, I took the chance and refactored crucial parts of the authorization class that in turn affected a lot of files. Most of the changes in the files are minimal and easy to review. The crucial changes are in the authorization class and some alerting APIs. ## Architecture ### Alerting RBAC model The Kibana security uses Elasticsearch's [application privileges](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-privileges.html#security-api-put-privileges). This way Kibana can represent and store its privilege models within Elasticsearch roles. To do that, Kibana security creates actions that are granted by a specific privilege. Alerting uses its own RBAC model and is built on top of the existing Kibana security model. The Alerting RBAC uses the `rule_type_id` and `consumer` attributes to define who owns the rule and the alerts procured by the rule. To connect the `rule_type_id` and `consumer` with the Kibana security actions the Alerting RBAC registers its custom actions. They are constructed as `alerting:<rule-type-id>/<feature-id>/<alerting-entity>/<operation>`. Because to authorizate a resource an action has to be generated and because the action needs a valid feature ID the value of the `consumer` should be a valid feature ID. For example, the `alerting:siem.esqlRule/siem/rule/get` action, means that a user with a role that grants this action can get a rule of type `siem.esqlRule` with consumer `siem`. ### Problem statement At the moment the `consumer` attribute should be a valid feature ID. Though this approach worked well so far it has its limitation. Specifically: - Rule types cannot support more than one consumer. - To associate old rules with a new feature ID required a migration on the rule's SOs and the alerts documents. - The API calls are feature ID-oriented and not rule-type-oriented. - The framework has to be aware of the values of the `consumer` attribute. - Feature IDs are tightly coupled with the alerting indices leading to [bugs](elastic#179082). - Legacy consumers that are not a valid feature anymore can cause [bugs](elastic#184595). - The framework has to be aware of legacy consumers to handle edge cases. - The framework has to be aware of specific consumers to handle edge cases. ### Proposed solution This PR aims to decouple the feature IDs from consumers. It achieves that a) by changing the way solutions configure the alerting privileges when registering a feature and b) by changing the alerting actions. The schema changes as: ``` // Old formatting id: 'siem', <--- feature ID alerting:['siem.queryRule'] // New formatting id: 'siem', <--- feature ID alerting: [{ ruleTypeId: 'siem.queryRule', consumers: ['siem'] }] <-- consumer same as the feature ID in the old formatting ``` The new actions are constructed as `alerting:<rule-type-id>/<consumer>/<alerting-entity>/<operation>`. For example `alerting:rule-type-id/my-consumer/rule/get`. The new action means that a user with a role that grants this action can get a rule of type `rule-type` with consumer `my-consumer`. Changing the action strings is not considered a breaking change as long as the user's permission works as before. In our case, this is true because the consumer will be the same as before (feature ID), and the alerting security actions will be the same. For example: **Old formatting** Schema: ``` id: 'logs', <--- feature ID alerting:['.es-query'] <-- rule type ID ``` Generated action: ``` alerting:.es-query/logs/rule/get ``` **New formatting** Schema: ``` id: 'siem', <--- feature ID alerting: [{ ruleTypeId: '.es-query', consumers: ['logs'] }] <-- consumer same as the feature ID in the old formatting ``` Generated action: ``` alerting:.es-query/logs/rule/get <--- consumer is set as logs and the action is the same as before ``` In both formating the actions are the same thus breaking changes are avoided. ### Alerting authorization class The alerting plugin uses and exports the alerting authorization class (`AlertingAuthorization`). The class is responsible for handling all authorization actions related to rules and alerts. The class changed to handle the new actions as described in the above sections. A lot of methods were renamed, removed, and cleaned up, all method arguments converted to be an object, and the response signature of some methods changed. These changes affected various pieces of the code. The changes in this class are the most important in this PR especially the `_getAuthorizedRuleTypesWithAuthorizedConsumers` method which is the cornerstone of the alerting RBAC. Please review carefully. ### Instantiation of the alerting authorization class The `AlertingAuthorizationClientFactory` is used to create instances of the `AlertingAuthorization` class. The `AlertingAuthorization` class needs to perform async operations upon instantiation. Because JS, at the moment, does not support async instantiation of classes the `AlertingAuthorization` class was assigning `Promise` objects to variables that could be resolved later in other phases of the lifecycle of the class. To improve readability and make the lifecycle of the class clearer, I separated the construction of the class (initialization) from the bootstrap process. As a result, getting the `AlertingAuthorization` class or any client that depends on it (`getRulesClient` for example) is an async operation. ### Filtering A lot of routes use the authorization class to get the authorization filter (`getFindAuthorizationFilter`), a filter that, if applied, returns only the rule types and consumers the user is authorized to. The method that returns the filter was built in a way to also support filtering on top of the authorization filter thus coupling the authorized filter with router filtering. I believe these two operations should be decoupled and the filter method should return a filter that gives you all the authorized rule types. It is the responsibility of the consumer, router in our case, to apply extra filters on top of the authorization filter. For that reason, I made all the necessary changes to decouple them. ### Legacy consumers & producer A lot of rules and alerts have been created and are still being created from observability with the `alerts` consumer. When the Alerting RBAC encounters a rule or alert with `alerts` as a consumer it falls back to the `producer` of the rule type ID to construct the actions. For example if a rule with `ruleTypeId: .es-query` and `consumer: alerts` the alerting action will be constructed as `alerting:.es-query/stackAlerts/rule/get` where `stackRules` is the producer of the `.es-query` rule type. The `producer` is used to be used in alerting authorization but due to its complexity, it was deprecated and only used as a fallback for the `alerts` consumer. To avoid breaking changes all feature privileges that specify access to rule types add the `alerts` consumer when configuring their alerting privileges. By moving the `alerts` consumer to the registration of the feature we can stop relying on the `producer`. The `producer` is not used anymore in the authorization class. In the next PRs the `producer` will removed entirely. ### Routes The following changes were introduced to the alerting routes: - All related routes changed to be rule-type oriented and not feature ID oriented. - All related routes support the `ruleTypeIds` and the `consumers` parameters for filtering. In all routes, the filters are constructed as `ruleTypeIds: ['foo'] AND consumers: ['bar'] AND authorizationFilter`. Filtering by consumers is important. In o11y for example, we do not want to show ES rule types with the `stackAlerts` consumer even if the user has access to them. - The `/internal/rac/alerts/_feature_ids` route got deleted as it was not used anywhere in the codebase and it was internal. All the changes in the routes are related to internal routes and no breaking changes are introduced. ### Constants I moved the o11y and stack rule type IDs to `kbn-rule-data-utils` and exported all security solution rule type IDs from `kbn-securitysolution-rules`. I am not a fan of having a centralized place for the rule type IDs. Ideally, consumers of the framework should specify keywords like `observablility` (category or subcategory) or even `apm.*` and the framework should know which rule type IDs to pick up. I think it is out of the scope of the PR, and at the moment it seems the most straightforward way to move forward. I will try to clean up as much as possible in further iterations. If you are interested in the upcoming work follow this issue elastic#187202. ### Other notable code changes - Change all instances of feature IDs to rule type IDs. - `isSiemRuleType`: This is a temporary helper function that is needed in places where we handle edge cases related to security solution rule types. Ideally, the framework should be agnostic to the rule types or consumers. The plan is to be removed entirely in further iterations. - Rename alerting `PluginSetupContract` and `PluginStartContract` to `AlertingServerSetup` and `AlertingServerStart`. This made me touch a lot of files but I could not resist. - `filter_consumers` was mistakenly exposed to a public API. It was undocumented. - Files or functions that were not used anywhere in the codebase got deleted. - Change the returned type of the `list` method of the `RuleTypeRegistry` from `Set<RegistryRuleType>` to `Map<string, RegistryRuleType>`. - Assertion of `KueryNode` in tests changed to an assertion of KQL using `toKqlExpression`. - Removal of `useRuleAADFields` as it is not used anywhere. ## Testing > [!CAUTION] > It is very important to test all the areas of the application where rules or alerts are being used directly or indirectly. Scenarios to consider: > - The correct rules, alerts, and aggregations on top of them are being shown as expected as a superuser. > - The correct rules, alerts, and aggregations on top of them are being shown as expected by a user with limited access to certain features. > - The changes in this PR are backward compatible with the previous users' permissions. ### Solutions Please test and verify that: - All the rule types you own with all possible combinations of permissions both in ESS and in Serverless. - The consumers and rule types make sense when registering the features. - The consumers and rule types that are passed to the components are the intended ones. ### ResponseOps The most important changes are in the alerting authorization class, the search strategy, and the routes. Please test: - The rules we own with all possible combinations of permissions. - The stack alerts page and its solution filtering. - The categories filtering in the maintenance window UI. ## Risks > [!WARNING] > The risks involved in this PR are related to privileges. Specifically: > - Users with no privileges can access rules and alerts they do not have access to. > - Users with privileges cannot access rules and alerts they have access to. > > An excessive list of integration tests is in place to ensure that the above scenarios will not occur. In the case of a bug, we could a) release an energy release for serverless and b) backport the fix in ESS. Given that this PR is intended to be merged in 8.17 we have plenty of time to test and to minimize the chances of risks. ## FQA - I noticed that a lot of routes support the `filter` parameter where we can pass an arbitrary KQL filter. Why we do not use this to filter by the rule type IDs and the consumers and instead we introduce new dedicated parameters? The `filter` parameter should not be exposed in the first place. It assumes that the consumer of the API knows the underlying structure and implementation details of the persisted storage API (SavedObject client API). For example, a valid filter would be `alerting.attributes.rule_type_id`. In this filter the consumer should know a) the name of the SO b) the keyword `attributes` (storage implementation detail) and c) the name of the attribute as it is persisted in ES (snake case instead of camel case as it is returned by the APIs). As there is no abstraction layer between the SO and the API, it makes it very difficult to make changes in the persistent schema or the APIs. For all the above I decided to introduce new query parameters where the alerting framework has total control over it. - I noticed in the code a lot of instances where the consumer is used. Should not remove any logic around consumers? This PR is a step forward making the framework as agnostic as possible. I had to keep the scope of the PR as contained as possible. We will get there. It needs time :). - I noticed a lot of hacks like checking if the rule type is `siem`. Should not remove the hacks? This PR is a step forward making the framework as agnostic as possible. I had to keep the scope of the PR as contained as possible. We will get there. It needs time :). - I hate the "Role visibility" dropdown. Can we remove it? I also do not like it. The goal is to remove it. Follow elastic#189997. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Aleh Zasypkin <[email protected]> Co-authored-by: Paula Borgonovi <[email protected]>

## Summary ### Problem With the merge of the PR elastic#183756, the alerts function has stopped working in the Obs AI Assistant, because there has been a change to the query (when finding alerts) ### Solution Revert the change made to the query. ### Checklist - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

Change the schema of the alerting feature privilege

ed3015e

cnasikas added the release_note:skip Skip the PR/issue when compiling release notes label May 17, 2024

cnasikas self-assigned this May 17, 2024

cnasikas added 5 commits May 18, 2024 19:20

Merge branch 'main' into poc_decouple_consumers_feature_ids

82a1a65

Change augmentRuleTypesWithAuthorization to use consumers instead of …

c8ec120

…ruleTypeIds

Support legacy consumers

1a02609

Fixes in rule filtering

55f91fd

Merge branch 'main' into poc_decouple_consumers_feature_ids

b3cea58

cnasikas added 2 commits May 23, 2024 13:37

Merge branch 'main' into poc_decouple_consumers_feature_ids

5184d21

Change new schema

3b7c89f

cnasikas force-pushed the poc_decouple_consumers_feature_ids branch from 06d8499 to 3b7c89f Compare May 23, 2024 15:47

cnasikas changed the title ~~[POC] Decouple feature IDs from consumres~~ [POC] Decouple feature IDs from consumers May 28, 2024

cnasikas added 2 commits May 30, 2024 16:59

Merge branch 'main' into poc_decouple_consumers_feature_ids

b1a0a63

Filter out rule types with no registered consumers

8b5fd17

cnasikas mentioned this pull request Jun 5, 2024

[ResponseOps][Alerting] Decouple rule producer/consumer settings from Kibana feature ID #181559

Closed

cnasikas mentioned this pull request Jun 29, 2024

[ResponseOps][Meta] Alerting RBAC enhancements #187202

Open

cnasikas added 7 commits July 5, 2024 20:47

Merge branch 'main' into poc_decouple_consumers_feature_ids

ac46730

Merge branch 'main' into poc_decouple_consumers_feature_ids

c88e147

Refactor the way the AlertingAuthorization object is created

b69bff1

Add the alerts consumers to all rule types

e7363b1

Fix async type errors

e1b9845

Add test for alerting authorization object creation

41e5491

Add tests and support filtering

a2b73f9

cnasikas force-pushed the poc_decouple_consumers_feature_ids branch from f65ca2c to a2b73f9 Compare July 14, 2024 15:08

cnasikas added 5 commits July 16, 2024 13:21

Add more tests

0bb255b

Merge branch 'main' into poc_decouple_consumers_feature_ids

4c0c828

Finalize alerting auth unit tests and functionality

736a63e

Fix types

0a6261e

Fix tests

c4094d0

cnasikas force-pushed the poc_decouple_consumers_feature_ids branch from cefbb02 to 1ed4c79 Compare November 26, 2024 13:36

consulthys approved these changes Nov 26, 2024

View reviewed changes

cnasikas removed request for a team November 26, 2024 18:41

joemcelroy approved these changes Nov 27, 2024

View reviewed changes

cnasikas added 3 commits December 2, 2024 10:31

Merge branch 'main' into poc_decouple_consumers_feature_ids

6d984e0

Fix arguments in getAuthorizedAlertsIndices

9432aa3

Fix alerting types

643e04f

cauemarcondes reviewed Dec 2, 2024

View reviewed changes

PR feedback

7758052

cauemarcondes approved these changes Dec 2, 2024

View reviewed changes

Fix types in security solution

e7afb8e

ashokaditya approved these changes Dec 2, 2024

View reviewed changes

Merge branch 'main' into poc_decouple_consumers_feature_ids

a0c2dab

cnasikas merged commit a3496c9 into elastic:main Dec 3, 2024
10 checks passed

cnasikas deleted the poc_decouple_consumers_feature_ids branch December 3, 2024 10:21

cnasikas mentioned this pull request Dec 3, 2024

[8.x] [ResponseOps][Alerting] Decouple feature IDs from consumers (#183756) #202667

Merged

viduni94 mentioned this pull request Dec 10, 2024

[Obs AI Assistant] Fix alerts function #203695

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ResponseOps][Alerting] Decouple feature IDs from consumers #183756

[ResponseOps][Alerting] Decouple feature IDs from consumers #183756

cnasikas commented May 17, 2024 •

edited by kibanamachine

Loading

cnasikas commented May 20, 2024

consulthys left a comment

cauemarcondes Dec 2, 2024

cnasikas Dec 2, 2024

cauemarcondes left a comment

ashokaditya left a comment

elasticmachine commented Dec 3, 2024 •

edited

Loading

API count

ESLint disabled line counts

Total ESLint disabled count

kibanamachine commented Dec 3, 2024

kibanamachine commented Dec 3, 2024

cnasikas commented Dec 3, 2024

		];

		export const apmAlertingRuleTypeIds: string[] = [...OBSERVABILITY_RULE_TYPE_IDS];

[ResponseOps][Alerting] Decouple feature IDs from consumers #183756

[ResponseOps][Alerting] Decouple feature IDs from consumers #183756

Conversation

cnasikas commented May 17, 2024 • edited by kibanamachine Loading

Summary

Architecture

Alerting RBAC model

Problem statement

Proposed solution

Alerting authorization class

Instantiation of the alerting authorization class

Filtering

Legacy consumers & producer

Routes

Constants

Other notable code changes

Testing

Solutions

ResponseOps

Risks

FQA

cnasikas commented May 20, 2024

consulthys left a comment

Choose a reason for hiding this comment

cauemarcondes Dec 2, 2024

Choose a reason for hiding this comment

cnasikas Dec 2, 2024

Choose a reason for hiding this comment

cauemarcondes left a comment

Choose a reason for hiding this comment

ashokaditya left a comment

Choose a reason for hiding this comment

elasticmachine commented Dec 3, 2024 • edited Loading

💚 Build Succeeded

Metrics [docs]

Module Count

Public APIs missing comments

Async chunks

Public APIs missing exports

Page load bundle

API count

ESLint disabled line counts

Total ESLint disabled count

History

kibanamachine commented Dec 3, 2024

kibanamachine commented Dec 3, 2024

💔 All backports failed

Manual backport

Questions ?

cnasikas commented Dec 3, 2024

💚 All backports created successfully

Questions ?

cnasikas commented May 17, 2024 •

edited by kibanamachine

Loading

elasticmachine commented Dec 3, 2024 •

edited

Loading