-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Response Ops] Onboard metric threshold rule type to use framework alerts as data #166664
Conversation
f8fcdee
to
e1326ae
Compare
@@ -275,7 +275,7 @@ export class AlertsService implements IAlertsService { | |||
// check whether this context has been registered before | |||
if (this.registeredContexts.has(context)) { | |||
const registeredOptions = this.registeredContexts.get(context); | |||
if (!isEqual(opts, registeredOptions)) { | |||
if (!isEqual(omit(opts, 'shouldWrite'), omit(registeredOptions, 'shouldWrite'))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metric threshold alert registration options was actually used by multiple rule types within O11y so to limit the scope of this onboarding effort to a single rule type, I'm allowing different shouldWrite
flags for the same context. Everything else should be the same (field maps, component template refs, etc).
> & { | ||
// Defining a custom type for this because the schema generation script doesn't allow explicit null values | ||
'kibana.alert.evaluation.values'?: Array<number | null>; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We auto-generate a schema for each context based on the registered field map. The auto-generated schema can be found at packages/kbn-alerts-as-data-utils/src/schemas/generated/observability_metrics_schema.ts
. However, the mapping definition for kibana.alert.evaluation.values
is an array of scaled_float
values which translates to a schema Array<string | number>
(since ES accepts numerical strings and will coerce into number). However, the type that's generated by the code is Array<number | null>
. To accommodate, I'm modifying the auto-generated schema (which is just meant to be a convenience based on the field map anyway).
getAlertByAlertUuid, | ||
} = services; | ||
const { alertsClient, savedObjectsClient } = services; | ||
if (!alertsClient) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we have the alerts client initialized under undocumented feature flag (which defaults to true), so the alertsClient is typed as possibly undefined. We plan to remove the feature flag so we will be able to remove this check at that point.
[ALERT_REASON]: reason, | ||
[ALERT_ACTION_GROUP]: actionGroup, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
action group is set by the framework so does not need to be explicitly reported back
Pinging @elastic/response-ops (Team:ResponseOps) |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Response Ops changes LGTM! Pulled down and tested locally, tried the migration scenario and the alerts kept persisting as expected 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alerts: MetricsRulesTypeAlertDefinition, | ||
alerts: { | ||
...MetricsRulesTypeAlertDefinition, | ||
shouldWrite: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this shouldWrite
mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added this flag to the framework registration because we didn't want rules that were registered with both the rule registry and the alerting framework to have both write alerting docs. We should be able to remove this once we move all rules away from the rule registry
const indexedStartedAt = | ||
getAlertStartedDate(UNGROUPED_FACTORY_KEY) ?? startedAt.toISOString(); | ||
|
||
alert.scheduleActions(actionGroupId, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the scheduling action happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is rolled into the alertsClient.report
call. It doesn't seem like any rule executor reports an alert without wanting an action scheduled for it, so we do it in a single call instead of asking rule executors to explicitly create and then schedule.
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested context.host
with only host.name
group by field and it worked as expected.
@elasticmachine merge upstream |
@elasticmachine merge upstream |
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Unknown metric groupsESLint disabled line counts
References to deprecated APIs
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @ymao1 |
Resolves #164220
Summary
Removes the lifecycle executor wrapper around the metric threshold rule type executor so that this rule type is using the framework alerts client to write alerts as data documents.
Response ops changes
startedAt
date to the alerts client. Lifecycle executor rules use this standardized timestamp for the@timestamp
field of the AaD doc, as well as for the start and end time of an alertMetric threshold rule changes