-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Response Ops][Alerting] Initial implementation of FAAD AlertsClient
for writing generic AAD documents
#156946
Conversation
71942bd
to
6fc6eca
Compare
6fc6eca
to
f39ef52
Compare
b21541a
to
e21840d
Compare
@@ -154,7 +152,8 @@ export function createAlertFactory< | |||
autoRecoverAlerts, | |||
// flappingSettings.enabled is false, as we only want to use this function to get the recovered alerts | |||
flappingSettings: DISABLE_FLAPPING_SETTINGS, | |||
maintenanceWindowIds, | |||
// no maintenance window IDs are passed as we only want to use this function to get recovered alerts | |||
maintenanceWindowIds: [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we call processAlerts
here, it is just to get the map of recovered alert IDs to return to the executors so we don't need to know the maintenance window IDs.
AlertsClient
for writing generic AAD documents
@@ -18,6 +18,9 @@ import { RuleSnooze } from './rule_snooze_type'; | |||
export type RuleTypeState = Record<string, unknown>; | |||
export type RuleTypeParams = Record<string, unknown>; | |||
|
|||
// rule type defined alert fields to persist in alerts index | |||
export type RuleAlertData = Record<string, unknown>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new generic for rule types to specify their rule type specific alert schema. This probably isn't strictly needed until we implement the followup PR for #156443
f65aa1c
to
58f6254
Compare
Pinging @elastic/response-ops (Team:ResponseOps) |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM and ran some tests locally and everything worked great 🎉 I left a bunch of questions. The main question I have is about the bulk call to update alerts and OCC.
x-pack/plugins/alerting/server/alerts_client/lib/build_recovered_alert.ts
Show resolved
Hide resolved
// Set latest rule configuration | ||
rule: rule.kibana?.alert.rule, | ||
// Set status to 'recovered' | ||
status: 'recovered', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: should we also update action_group
field to recovered when building a recovered alert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an opinion one way or the other? I was undecided on whether to update the action_group
here since we would be losing the history of which action group the alert was active in, but I guess in the case of a rule type with multiple action groups, if an alert remained active but switched between groups, we would lose that history then anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ymao1 and I discussed offline and we'll update the action_group
on recovery. Which will solve the concern from #156946 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 1f376bd
x-pack/plugins/alerting/server/alerts_client/lib/build_recovered_alert.ts
Show resolved
Hide resolved
body: flatMap( | ||
[...activeAlertsToIndex, ...recoveredAlertsToIndex].map((alert: Alert & AlertData) => [ | ||
{ | ||
index: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: What happens if the bulk call happens at the same time a user updates the same alert document? Will this overwrite the user's action and/or vice versa?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question! I think that's something that could happen currently with the existing flow right? I think we could create a followup issue to discuss the correct behavior (like which update gets priority) unless you feel like it's something necessary to address in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think follow up issue is good for this one, we can take the time to think it through, especially if it's an edge case today and we haven't encountered it yet (I can see this being unlikely).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue created here: #158403
866f125
to
52e1b40
Compare
…-ref HEAD~1..HEAD --fix'
…nto alerting/faad-api-forreal
💚 Build Succeeded
Metrics [docs]Public APIs missing comments
Unknown metric groupsAPI count
ESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @ymao1 |
Resolves #156442
Summary
shouldWriteAlerts
flag to rule type registration which defaults tofalse
if not set. This prevents duplicate AAD documents from being written for the rule registry rule types that had to register with the framework in order to get their resources installed on startup.AlertsClient
which primarily functions as a proxy to theLegacyAlertsClient
. It does 2 additional thing:a. When initialized with the active & recovered alerts from the previous execution (de-serialized from the task manager state), it queries the AAD index for the corresponding alert document.
b. When returning the alerts to serialize into the task manager state, it builds the alert document and bulk upserts into the AAD index.
This PR does not opt any rule types into writing these generic docs but adds an example functional test that does. To test it out with the ES query rule type, add the following
To Verify