Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Response Ops][Alerting] Initial implementation of FAAD AlertsClient for writing generic AAD documents #156946

Merged
merged 21 commits into from
May 24, 2023

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented May 6, 2023

Resolves #156442

Summary

  1. Adds shouldWriteAlerts flag to rule type registration which defaults to false if not set. This prevents duplicate AAD documents from being written for the rule registry rule types that had to register with the framework in order to get their resources installed on startup.
  2. Initial implementation of AlertsClient which primarily functions as a proxy to the LegacyAlertsClient. It does 2 additional thing:
    a. When initialized with the active & recovered alerts from the previous execution (de-serialized from the task manager state), it queries the AAD index for the corresponding alert document.
    b. When returning the alerts to serialize into the task manager state, it builds the alert document and bulk upserts into the AAD index.

This PR does not opt any rule types into writing these generic docs but adds an example functional test that does. To test it out with the ES query rule type, add the following

diff --git a/x-pack/plugins/stack_alerts/server/rule_types/es_query/rule_type.ts b/x-pack/plugins/stack_alerts/server/rule_types/es_query/rule_type.ts
index 214d2ee4b76..0439a576b03 100644
--- a/x-pack/plugins/stack_alerts/server/rule_types/es_query/rule_type.ts
+++ b/x-pack/plugins/stack_alerts/server/rule_types/es_query/rule_type.ts
@@ -187,5 +187,12 @@ export function getRuleType(
     },
     producer: STACK_ALERTS_FEATURE_ID,
     doesSetRecoveryContext: true,
+    alerts: {
+      context: 'stack',
+      shouldWrite: true,
+      mappings: {
+        fieldMap: {},
+      },
+    },
   };
 }

To Verify

  • Verify that rule registry rule types still work as expected
  • Verify that non rule-registry rule types still work as expected
  • Modify a rule type to register with FAAD and write alerts and verify that the alert documents look as expected.

@ymao1 ymao1 force-pushed the alerting/faad-api-forreal branch from 71942bd to 6fc6eca Compare May 6, 2023 15:29
@ymao1 ymao1 force-pushed the alerting/faad-api-forreal branch from 6fc6eca to f39ef52 Compare May 6, 2023 15:30
@ymao1 ymao1 force-pushed the alerting/faad-api-forreal branch from b21541a to e21840d Compare May 15, 2023 16:05
@@ -154,7 +152,8 @@ export function createAlertFactory<
autoRecoverAlerts,
// flappingSettings.enabled is false, as we only want to use this function to get the recovered alerts
flappingSettings: DISABLE_FLAPPING_SETTINGS,
maintenanceWindowIds,
// no maintenance window IDs are passed as we only want to use this function to get recovered alerts
maintenanceWindowIds: [],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we call processAlerts here, it is just to get the map of recovered alert IDs to return to the executors so we don't need to know the maintenance window IDs.

@ymao1 ymao1 changed the title Alerting/faad api forreal [Response Ops][Alerting] Initial implementation of FAAD AlertsClient for writing generic AAD documents May 17, 2023
@@ -18,6 +18,9 @@ import { RuleSnooze } from './rule_snooze_type';
export type RuleTypeState = Record<string, unknown>;
export type RuleTypeParams = Record<string, unknown>;

// rule type defined alert fields to persist in alerts index
export type RuleAlertData = Record<string, unknown>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new generic for rule types to specify their rule type specific alert schema. This probably isn't strictly needed until we implement the followup PR for #156443

@ymao1 ymao1 self-assigned this May 17, 2023
@ymao1 ymao1 added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Feature:Alerting/Alerts-as-Data Issues related to Alerts-as-data and RuleRegistry v8.9.0 release_note:skip Skip the PR/issue when compiling release notes labels May 17, 2023
@ymao1 ymao1 force-pushed the alerting/faad-api-forreal branch from f65aa1c to 58f6254 Compare May 18, 2023 00:47
@ymao1 ymao1 marked this pull request as ready for review May 18, 2023 17:46
@ymao1 ymao1 requested a review from a team as a code owner May 18, 2023 17:46
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@ymao1
Copy link
Contributor Author

ymao1 commented May 23, 2023

@elasticmachine merge upstream

@mikecote mikecote self-requested a review May 23, 2023 14:23
Copy link
Contributor

@mikecote mikecote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM and ran some tests locally and everything worked great 🎉 I left a bunch of questions. The main question I have is about the bulk call to update alerts and OCC.

// Set latest rule configuration
rule: rule.kibana?.alert.rule,
// Set status to 'recovered'
status: 'recovered',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: should we also update action_group field to recovered when building a recovered alert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an opinion one way or the other? I was undecided on whether to update the action_group here since we would be losing the history of which action group the alert was active in, but I guess in the case of a rule type with multiple action groups, if an alert remained active but switched between groups, we would lose that history then anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ymao1 and I discussed offline and we'll update the action_group on recovery. Which will solve the concern from #156946 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 1f376bd

body: flatMap(
[...activeAlertsToIndex, ...recoveredAlertsToIndex].map((alert: Alert & AlertData) => [
{
index: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: What happens if the bulk call happens at the same time a user updates the same alert document? Will this overwrite the user's action and/or vice versa?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question! I think that's something that could happen currently with the existing flow right? I think we could create a followup issue to discuss the correct behavior (like which update gets priority) unless you feel like it's something necessary to address in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think follow up issue is good for this one, we can take the time to think it through, especially if it's an edge case today and we haven't encountered it yet (I can see this being unlikely).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue created here: #158403

@ymao1 ymao1 force-pushed the alerting/faad-api-forreal branch from 866f125 to 52e1b40 Compare May 24, 2023 15:03
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
alerting 594 595 +1
Unknown metric groups

API count

id before after diff
alerting 615 617 +2

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
securitySolution 399 403 +4
total +6

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
securitySolution 479 483 +4
total +6

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

@ymao1 ymao1 merged commit 282305f into elastic:main May 24, 2023
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label May 24, 2023
@ymao1 ymao1 deleted the alerting/faad-api-forreal branch May 24, 2023 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Alerting/Alerts-as-Data Issues related to Alerts-as-data and RuleRegistry Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Response Ops][Alerting] Create AlertsClient for writing generic alerts-as-data documents
5 participants