Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ResponseOps][Alerting] Decouple feature IDs from consumers #183756

Draft
wants to merge 151 commits into
base: main
Choose a base branch
from

Conversation

cnasikas
Copy link
Member

@cnasikas cnasikas commented May 17, 2024

Summary

This PR aims to decouple the feature IDs from the consumer attribute of rules and alerts.

Fixes: #181559

Architecture

Alerting RBAC model

The Kibana security uses Elasticsearch's application privileges. This way Kibana can represent and store its privilege models within Elasticsearch roles. To do that, Kibana security creates actions that are granted by a specific privilege. Alerting uses its own RBAC model and is built on top of the existing Kibana security model. The Alerting RBAC uses the rule_type_id and consumer attribute to define who owns the rule and the alerts procured by the rule. To connect the rule_type_id and consumer with the Kibana security actions the Alerting RBAC registers its custom actions. They are constructed as alerting:<rule-type-id>/<feature-id>/<alerting-entity>/<operation>. For example alerting:siem.esqlRule/siem/rule/get. This action means that a user with a role that grants this action can get a rule of type siem.esqlRule with consumer siem.

Problem statement

At the moment the consumer attribute should be a valid feature ID. Though this approach worked well so far it has its limitation. Specifically:

  • Rule types cannot support more than one consumer.
  • To associate old rules with a new feature ID required a migration on the rule's SOs and the alerts documents.
  • The API calls are feature ID oriented and not rule type oriented.
  • The framework has to be aware of the values of the consumer attribute.
  • Feature IDs are tightly coupled with the alerting indices leading to bugs.
  • Legacy consumers that are not a valid feature anymore can cause bugs.
  • The framework has to be aware of legacy consumers to handle edge cases.

Proposed solution

This PR aims to decouple the feature IDs from consumers. It achieves that a) by changing the way solutions configure the alerting privileges when registering a feature and b) by changing the alerting actions. The schema changes as:

// Old formatting
id: 'siem', <--- feature ID
alerting:['siem.queryRule']

// New formatting
id: 'siem', <--- feature ID
alerting: [{ ruleTypeId: 'siem.queryRule', consumers: ['siem'] }] <-- consumer same as the feature ID in the old formatting

The new actions are constructed as alerting:<rule-type-id>/<consumer>/<alerting-entity>/<operation>. For example alerting:rule-type-id/my-consumer/rule/get. The new action means that a user with a role that grants this action can get a rule of type rule-type with consumer my-consumer. Changing the action strings is not considered a breaking change as long as the user's permission works as before. In our case, this is true because the consumer will be the same as before (feature ID), and the alerting security actions will be the same. For example:

Old formatting

Schema:

id: 'logs', <--- feature ID
alerting:['.es-query'] <-- rule type ID

Action:

alerting:.es-query/logs/rule/get

New formatting

Schema:

id: 'siem', <--- feature ID
alerting: [{ ruleTypeId: '.es-query', consumers: ['logs'] }] <-- consumer same as the feature ID in the old formatting

Action:

alerting:.es-query/logs/rule/get <--- consumer is set as logs same as before

In both formating the actions are the same thus breaking changes are avoided.

Alerting authorization class

The alerting plugin uses and exports the alerting authorization class (AlertingAuthorization). The class is responsible to handle all authorization actions related to rules and alerts. The class changed to handle the new actions as described in the above sections. A lot of methods were renamed, removed, cleaned up, or have their argument or return types changed. These changed affected various piece of the code. The changes in this class are the most important in this PR especially the _getAuthorizedRuleTypesWithAuthorizedConsumers method which is the cornerstone of the alerting RBAC. Please review carefully.

Instatiation of the alerting authorization class

The AlertingAuthorizationClientFactory is used to create instances of the AlertingAuthorization class. The AlertingAuthorization class needs to perform async operations upon instatiation. Because JS, at the moment, does not support async instantiation of classes the AlertingAuthorization class was assiging Promise objects to variables that could be resolved later in other phases of the lifecycle of the class. To improve redability and make it clearer the lifecycle of the class I seperated the construction of the class (initialization) from the bootstrap process. As a result getting the AlertingAuthorization class or any client that depends on it (RulesClient for example) is an async operation.

Filtering

A lot of routes are using the authorization class to get the authorization filter (getFindAuthorizationFilter), a filter that, if applied, returns only the rule types and consumers the user is authorized to. The method that returns the filter was build in a way to also support filtering on top of the authorization filter thus coupling the authorized filter with route filtering. I believe these two operation should be decoupled and the filter method should return a filter that gives you all the authorized rule types. It is the responsibility of the consumer, route in our case, to apply extra filters on top of the authorization filter. For that reason, I did all necessary changes to decoupe them.

Legacy consumers & producer

A lot of rules and alerts have been created and are still being created from observability with the alerts consumer. When the Alerting RBAC encounters a rule or alert with alerts as a consumer it falls back to the producer of the rule type ID to construct the actions. For example if a rule with ruleTypeId: .es-query and consumer: alerts the alerting action will be constructed as alerting:.es-query/stackRules/rule/get where stackRules is the producer of the .es-query rule type. The producer is used to be used in alerting authorization but due to its complexity, it was deprecated and only used as a fallback for the alerts consumer. To avoid breaking changes all rule types will set the alerts consumer as valid consumers when configuring the alerting privileges. By moving the alerts consumer to the registration of the feature we can stop relying on the producer. In the next PRs the producer will removed entirely.

Routes

The following changes were introduced to the alerting routes:

  • All related routes changed to be rule-type oriented and not feature ID oriented.
  • All related routes support the ruleTypeIds and the consumers parameters for filtering.
  • The /internal/rac/alerts/_feature_ids route got deleted as it was not used anywhere in the codebase and it was internal.

All the changes in the routes are related to internal routes and no breaking change is introduced.

Notable code changes

  • Change all instances of feature IDs to rule type IDs.
  • isSiemRuleType: Temporary helper function. The plan is to be removed entirely in further iterations.
  • Move o11y and stack rule type IDs to kbn-rule-data-utils.
  • Export all security solution rule type IDs from kbn-securitysolution-rules.
  • Rename alerting PluginSetupContract and PluginStartContract to AlertingServerSetup and AlertingServerStart.
  • Change the way the AlertingAuthorization class is instantiated.
  • getRulesClient converted to an async function.
  • Rename AlertingAuthorization methods and make its methods to take only an object as argument.
  • Change the response signature of some methods of the AlertingAuthorization class.
  • filter_consumers was mistakenly exposed to a public API. It was undocumented.
  • The getFindAuthorizationFilter authorization helper function does not accept filters. It should return a filter for all authorized rule types regardless of the request. Filtering by ruleTypeIds moved to calls to ES or the SO client.
  • Files or functions that were not used anywhere in the codebase got deleted.
  • Change the returned type of the list method of the RuleTypeRegistry from Set<RegistryRuleType> to Map<string, RegistryRuleType>.
  • Assertion of KueryNode in tests changed to assetion of KQL using toKqlExpression.

Testing

Caution

It is very important to test all the areas of the application where rules or alerst are being used directly or indirectly. Scenarios to consider:

  • The correct rules, alerts, and aggregations on top of them are being show as expected as a superuser.
  • The correct rules, alerts, and aggregations on top of them are being show as expected as a user with limitted access to certain features.
  • The changes in this PR are backwards compatible with users' roles.

Solutions

Please test all the rule types you own with all possible combinations of permissions.

ResponseOps

Please test all stack rules with all possible combinations of permissions.

Risk Matrix

FQA

  • I noticed that a lot of routes supports the filter paramater where we can pass an arbitrary KQL filter. Why we do not use this to filter by the rule type IDs and the consumers and instead we introduce new dedicated paramaters?
  • Why we need to filter by consumer? Should not the ruleTypeIds be enough?
  • I noticed in the code a lot of instances where the consumer is used. Should not remove any logic around consumers?

@cnasikas cnasikas added the release_note:skip Skip the PR/issue when compiling release notes label May 17, 2024
@cnasikas cnasikas self-assigned this May 17, 2024
@cnasikas
Copy link
Member Author

/ci

@cnasikas cnasikas force-pushed the poc_decouple_consumers_feature_ids branch from 06d8499 to 3b7c89f Compare May 23, 2024 15:47
@cnasikas cnasikas changed the title [POC] Decouple feature IDs from consumres [POC] Decouple feature IDs from consumers May 28, 2024
@cnasikas cnasikas force-pushed the poc_decouple_consumers_feature_ids branch from f65ca2c to a2b73f9 Compare July 14, 2024 15:08
@cnasikas
Copy link
Member Author

/ci

@cnasikas
Copy link
Member Author

/ci

Comment on lines +129 to +130
rule_type_ids: schema.maybe(schema.arrayOf(schema.string())),
consumers: schema.maybe(schema.arrayOf(schema.string())),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same schema as the public plus the rule_type_ids and the consumers.

@cnasikas
Copy link
Member Author

/ci

@@ -82,7 +81,7 @@ export type AlertFilterControlsProps = Omit<
*/
export const AlertFilterControls = (props: AlertFilterControlsProps) => {
const {
featureIds = [AlertConsumers.STACK_ALERTS],
ruleTypeIds = [],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we default it to stack rules?

Comment on lines +31 to +32
ALERTS: 'alerts',
DISCOVER: 'discover',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the consumers are decoupled from the feature IDs the list should include all possible consumers so far. alerts and discover are valid ones. Ideally, we should not have a list of possible consumers. I hope in the subsequent PRs we will remove it.

/**
* TODO: Abstract it and remove it
*/
export const isSiemRuleType = (ruleTypeId: string) => ruleTypeId.startsWith('siem.');
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the codebase, we have a lot of checks (hacks) related to security rule types. To reduce the scope of the PR as much as possible I choose to try to fix it slowly on subsequent PRs. At the moment is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of having a centralized place for the rule type IDs. Ideally, consumers of the framework should specify keywords like observablility (category or subcategory) or even apm.* and the framework should know which rule type IDs to pick up. But again, I think it is out of the scope of the PR, and at the moment it seems the most straightforward way to move forward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inspired by the cases utils for filtering.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored a bit the code to accommodate the async nature of getRulesClient. It should work as before.

'apm',
'slo',
'uptime',
'observability',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the observability match to more rule type IDs I have put? Should I add all o11y rule type IDs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you create a rule from the main o11y page the consumer is set to alerts to most of the rules. For that reason, I added the AlertsConsumer.ALERTS to fetch alerts with kibana.alert.rule.consumer: alerts. Also, I am not sure if we want to include stack rules which I included because some of the stack rules can be created with the logs or infrastructure consumer.

@cnasikas
Copy link
Member Author

/ci

@cnasikas
Copy link
Member Author

/ci

@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 20, 2024

💔 Build Failed

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #21 / ObservabilityApp Observability Rules page Create rules flyout Should allow the user to select consumers when creating ES query rules
  • [job] [logs] FTR Configs #21 / ObservabilityApp Observability Rules page Create rules flyout Should allow the user to select consumers when creating ES query rules
  • [job] [logs] Jest Tests #3 / owner utils getOwnerFromRuleConsumerProducer returns owner { id: 'cases', validRuleConsumers: [Array] } correctly for consumer
  • [job] [logs] Jest Tests #3 / owner utils getOwnerFromRuleConsumerProducer returns owner { id: 'cases', validRuleConsumers: [Array] } correctly for producer
  • [job] [logs] FTR Configs #19 / Serverless observability API - feature flags Platform security APIs security/authorization available features composite features
  • [job] [logs] FTR Configs #19 / Serverless observability API - feature flags Platform security APIs security/authorization available features composite features

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
alerting 239 238 -1
cases 812 816 +4
observability 1063 1062 -1
securitySolution 6037 6035 -2
triggersActionsUi 850 848 -2
total -2

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/alerts-ui-shared 304 294 -10
@kbn/rule-data-utils 133 140 +7
@kbn/securitysolution-rules 25 26 +1
alerting 848 839 -9
ml 63 64 +1
ruleRegistry 248 244 -4
total -14

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
alerting 93.8KB 91.2KB -2.6KB
apm 3.4MB 3.4MB +73.0B
cases 492.0KB 492.1KB +81.0B
discover 824.9KB 824.9KB +42.0B
infra 1.7MB 1.7MB +245.0B
ml 4.5MB 4.5MB -193.0B
observability 470.6KB 469.7KB -853.0B
observabilityLogsExplorer 147.3KB 147.6KB +358.0B
securitySolution 20.7MB 20.7MB +161.0B
slo 855.3KB 855.6KB +273.0B
synthetics 1.1MB 1.1MB +420.0B
triggersActionsUi 1.7MB 1.7MB +52.0B
total -2.0KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
apm 38.0KB 38.2KB +209.0B
cases 151.2KB 151.3KB +132.0B
infra 54.1KB 55.0KB +929.0B
ml 75.3KB 76.0KB +673.0B
observability 103.8KB 104.6KB +766.0B
observabilityShared 75.0KB 75.1KB +36.0B
securitySolution 87.5KB 87.5KB -1.0B
slo 24.8KB 25.1KB +394.0B
synthetics 37.6KB 37.9KB +325.0B
triggersActionsUi 127.4KB 127.0KB -363.0B
total +3.0KB
Unknown metric groups

API count

id before after diff
@kbn/alerts-grouping 31 32 +1
@kbn/alerts-ui-shared 320 310 -10
@kbn/rule-data-utils 136 152 +16
@kbn/securitysolution-rules 28 29 +1
alerting 880 872 -8
ml 148 149 +1
ruleRegistry 285 281 -4
total -3

ESLint disabled line counts

id before after diff
@kbn/test-suites-xpack 730 731 +1

Total ESLint disabled count

id before after diff
@kbn/test-suites-xpack 755 756 +1

History

cc @cnasikas

@@ -191,7 +223,15 @@ export const ruleRegistrySearchStrategyProvider = (
);
}

throw err;
if (Boom.isBoom(err)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Search strategy was always thrown a 500 error without respecting the error codes, like 403, throwing by the alerts authorization. I tried to fix that.

);
});

test('requests the same number of rules as the number of ids provided', () => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I combined the three tests into one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split up the PageContent into PageContentWrapper and PageContent so the PageContent renders after the loading of the available rule types.

@@ -579,7 +583,7 @@ const AlertsTableStateWithQueryProvider = memo(

return (
<AlertsTableContext.Provider value={alertsTableContext}>
{!isLoading && alertsCount === 0 && (
{!isLoading && alertsCount <= 0 && (
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If alertsCount is undefined is initialized above as -1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.17 candidate Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ResponseOps][Alerting] Decouple rule producer/consumer settings from Kibana feature ID
5 participants