[Response Ops] Onboard metric threshold rule type to use framework alerts as data #166664

ymao1 · 2023-09-18T19:20:22Z

Summary

Removes the lifecycle executor wrapper around the metric threshold rule type executor so that this rule type is using the framework alerts client to write alerts as data documents.

Response ops changes

Passing in task startedAt date to the alerts client. Lifecycle executor rules use this standardized timestamp for the @timestamp field of the AaD doc, as well as for the start and end time of an alert

Metric threshold rule changes

Switch to using the alerts client in the executor to report alerts and to get recovered alert information.

ymao1 · 2023-10-04T13:46:41Z

x-pack/plugins/alerting/server/alerts_service/alerts_service.ts

@@ -275,7 +275,7 @@ export class AlertsService implements IAlertsService {
    // check whether this context has been registered before
    if (this.registeredContexts.has(context)) {
      const registeredOptions = this.registeredContexts.get(context);
-      if (!isEqual(opts, registeredOptions)) {
+      if (!isEqual(omit(opts, 'shouldWrite'), omit(registeredOptions, 'shouldWrite'))) {


The metric threshold alert registration options was actually used by multiple rule types within O11y so to limit the scope of this onboarding effort to a single rule type, I'm allowing different shouldWrite flags for the same context. Everything else should be the same (field maps, component template refs, etc).

ymao1 · 2023-10-04T13:57:02Z

x-pack/plugins/infra/server/lib/alerting/metric_threshold/metric_threshold_executor.ts

+> & {
+  // Defining a custom type for this because the schema generation script doesn't allow explicit null values
+  'kibana.alert.evaluation.values'?: Array<number | null>;
+};


We auto-generate a schema for each context based on the registered field map. The auto-generated schema can be found at packages/kbn-alerts-as-data-utils/src/schemas/generated/observability_metrics_schema.ts. However, the mapping definition for kibana.alert.evaluation.values is an array of scaled_float values which translates to a schema Array<string | number> (since ES accepts numerical strings and will coerce into number). However, the type that's generated by the code is Array<number | null>. To accommodate, I'm modifying the auto-generated schema (which is just meant to be a convenience based on the field map anyway).

ymao1 · 2023-10-04T13:59:02Z

x-pack/plugins/infra/server/lib/alerting/metric_threshold/metric_threshold_executor.ts

-      getAlertByAlertUuid,
-    } = services;
+    const { alertsClient, savedObjectsClient } = services;
+    if (!alertsClient) {


Currently we have the alerts client initialized under undocumented feature flag (which defaults to true), so the alertsClient is typed as possibly undefined. We plan to remove the feature flag so we will be able to remove this check at that point.

ymao1 · 2023-10-04T13:59:26Z

x-pack/plugins/infra/server/lib/alerting/metric_threshold/metric_threshold_executor.ts

          [ALERT_REASON]: reason,
-          [ALERT_ACTION_GROUP]: actionGroup,


action group is set by the framework so does not need to be explicitly reported back

…d-rule-aad

elasticmachine · 2023-10-05T19:01:37Z

Pinging @elastic/response-ops (Team:ResponseOps)

ymao1 · 2023-10-09T12:01:28Z

@elasticmachine merge upstream

mikecote

Response Ops changes LGTM! Pulled down and tested locally, tried the migration scenario and the alerts kept persisting as expected 👍

maryam-saeidi

Tested locally and alerts were generated as expected 👏🏻 Just added some clarification questions.

I also checked some action variables and I didn't see context.host, I will double check it on Monday. Please let me know if you have any other specific scenario in mind that needs to be tested :)

maryam-saeidi · 2023-10-13T13:35:05Z

...ck/plugins/infra/server/lib/alerting/metric_threshold/register_metric_threshold_rule_type.ts

-    alerts: MetricsRulesTypeAlertDefinition,
+    alerts: {
+      ...MetricsRulesTypeAlertDefinition,
+      shouldWrite: true,


What does this shouldWrite mean?

We added this flag to the framework registration because we didn't want rules that were registered with both the rule registry and the alerting framework to have both write alerting docs. We should be able to remove this once we move all rules away from the rule registry

maryam-saeidi · 2023-10-13T14:45:25Z

x-pack/plugins/infra/server/lib/alerting/metric_threshold/metric_threshold_executor.ts

-        const indexedStartedAt =
-          getAlertStartedDate(UNGROUPED_FACTORY_KEY) ?? startedAt.toISOString();
-
-        alert.scheduleActions(actionGroupId, {


Where does the scheduling action happen?

It is rolled into the alertsClient.report call. It doesn't seem like any rule executor reports an alert without wanting an action scheduled for it, so we do it in a single call instead of asking rule executors to explicitly create and then schedule.

ymao1 · 2023-10-13T16:40:57Z

@elasticmachine merge upstream

maryam-saeidi

Tested context.host with only host.name group by field and it worked as expected.

ymao1 · 2023-10-17T11:56:41Z

@elasticmachine merge upstream

ymao1 · 2023-10-18T14:48:55Z

@elasticmachine merge upstream

kibana-ci · 2023-10-18T16:03:16Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 66e19e0

Failed CI Steps

Test Failures

[job] [logs] FTR Configs #66 / Cases - group 1 Create case "before each" hook for "creates a case with custom fields"
[job] [logs] Explore - Security Solution Cypress Tests #2 / risk tab with new risk score renders the table renders the table

Metrics [docs]

Unknown metric groups

ESLint disabled line counts

id	before	after	diff
`infra`	45	43	-2

References to deprecated APIs

id	before	after	diff
`infra`	49	48	-1

Total ESLint disabled count

id	before	after	diff
`infra`	54	52	-2

History

💚 Build #168543 succeeded 263bcae
💚 Build #167879 succeeded 307f70c
💚 Build #166286 succeeded f47ad0e
💛 Build #165655 was flaky 638e1df
💛 Build #165248 was flaky e1326ae
💛 Build #165117 was flaky f8fcdee343191a203bb71a5a67a7c48b09e0d5c2

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

Onboarding metric threshold rule to FAaD

e1326ae

ymao1 force-pushed the metric-threshold-rule-aad branch from f8fcdee to e1326ae Compare October 4, 2023 12:11

ymao1 changed the title ~~Metric threshold rule aad~~ [Response Ops] Onboard metric threshold rule type to use framework alerts as data Oct 4, 2023

ymao1 commented Oct 4, 2023

View reviewed changes

ymao1 added 2 commits October 5, 2023 10:50

Merge branch 'main' of github.com:elastic/kibana into metric-threshol…

dcbcb58

…d-rule-aad

Handling objects in schema generation

638e1df

ymao1 self-assigned this Oct 5, 2023

ymao1 added release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Feature:Alerting/Alerts-as-Data Issues related to Alerts-as-data and RuleRegistry v8.12.0 labels Oct 5, 2023

ymao1 marked this pull request as ready for review October 5, 2023 19:01

ymao1 requested review from a team as code owners October 5, 2023 19:01

Merge branch 'main' into metric-threshold-rule-aad

f47ad0e

mikecote self-requested a review October 10, 2023 17:54

mikecote approved these changes Oct 10, 2023

View reviewed changes

maryam-saeidi self-requested a review October 13, 2023 08:09

maryam-saeidi reviewed Oct 13, 2023

View reviewed changes

Merge branch 'main' into metric-threshold-rule-aad

307f70c

maryam-saeidi approved these changes Oct 16, 2023

View reviewed changes

Merge branch 'main' into metric-threshold-rule-aad

263bcae

neptunian approved these changes Oct 18, 2023

View reviewed changes

Merge branch 'main' into metric-threshold-rule-aad

66e19e0

ymao1 merged commit f4dda26 into elastic:main Oct 18, 2023

kibanamachine added the backport:skip This commit does not require backporting label Oct 18, 2023

ymao1 deleted the metric-threshold-rule-aad branch October 18, 2023 16:09

watson mentioned this pull request Oct 19, 2023

[CI] Idea: Merge gating #103180

Closed

mikecote mentioned this pull request Nov 22, 2023

AAD Adoption - Onboard remaining O11y rule types to use alerts-as-data #171793

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Response Ops] Onboard metric threshold rule type to use framework alerts as data #166664

[Response Ops] Onboard metric threshold rule type to use framework alerts as data #166664

ymao1 commented Sep 18, 2023 •

edited

Loading

ymao1 Oct 4, 2023

ymao1 Oct 4, 2023 •

edited

Loading

ymao1 Oct 4, 2023

ymao1 Oct 4, 2023

elasticmachine commented Oct 5, 2023

ymao1 commented Oct 9, 2023

mikecote left a comment

maryam-saeidi left a comment

maryam-saeidi Oct 13, 2023

ymao1 Oct 13, 2023

maryam-saeidi Oct 13, 2023 •

edited

Loading

ymao1 Oct 13, 2023

ymao1 commented Oct 13, 2023

maryam-saeidi left a comment

ymao1 commented Oct 17, 2023

ymao1 commented Oct 18, 2023

kibana-ci commented Oct 18, 2023

ESLint disabled line counts

References to deprecated APIs

Total ESLint disabled count

[Response Ops] Onboard metric threshold rule type to use framework alerts as data #166664

[Response Ops] Onboard metric threshold rule type to use framework alerts as data #166664

Conversation

ymao1 commented Sep 18, 2023 • edited Loading

Summary

Response ops changes

Metric threshold rule changes

ymao1 Oct 4, 2023

Choose a reason for hiding this comment

ymao1 Oct 4, 2023 • edited Loading

Choose a reason for hiding this comment

ymao1 Oct 4, 2023

Choose a reason for hiding this comment

ymao1 Oct 4, 2023

Choose a reason for hiding this comment

elasticmachine commented Oct 5, 2023

ymao1 commented Oct 9, 2023

mikecote left a comment

Choose a reason for hiding this comment

maryam-saeidi left a comment

Choose a reason for hiding this comment

maryam-saeidi Oct 13, 2023

Choose a reason for hiding this comment

ymao1 Oct 13, 2023

Choose a reason for hiding this comment

maryam-saeidi Oct 13, 2023 • edited Loading

Choose a reason for hiding this comment

ymao1 Oct 13, 2023

Choose a reason for hiding this comment

ymao1 commented Oct 13, 2023

maryam-saeidi left a comment

Choose a reason for hiding this comment

ymao1 commented Oct 17, 2023

ymao1 commented Oct 18, 2023

kibana-ci commented Oct 18, 2023

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

Metrics [docs]

ESLint disabled line counts

References to deprecated APIs

Total ESLint disabled count

History

ymao1 commented Sep 18, 2023 •

edited

Loading

ymao1 Oct 4, 2023 •

edited

Loading

maryam-saeidi Oct 13, 2023 •

edited

Loading