-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alerting] Stack Rules on Rule Registry POC #96966
[Alerting] Stack Rules on Rule Registry POC #96966
Conversation
…ing/stack-rules-rules-registry-poc
…ing/stack-rules-rules-registry-poc
This reverts commit 44b04eb.
…ing/stack-rules-rules-registry-poc
…ing/stack-rules-rules-registry-poc
…ause strict mapping is enabled. Extracted lifecycle calculation into helper function for reuse
…ing/stack-rules-rules-registry-poc
…ing/stack-rules-rules-registry-poc
…ing/stack-rules-rules-registry-poc
…e for showing alerts in UI
…ing/stack-rules-rules-registry-poc
💔 Build Failed
Failed CI StepsTest FailuresKibana Pipeline / jest / Jest Tests.x-pack/plugins/apm/server/lib/alerts.Transaction duration anomaly alert doesn't send alert ml is not definedStandard Out
Stack Trace
Kibana Pipeline / jest / Jest Tests.x-pack/plugins/apm/server/lib/alerts.Transaction duration anomaly alert doesn't send alert ml jobs are not availableStandard Out
Stack Trace
Kibana Pipeline / jest / Jest Tests.x-pack/plugins/apm/server/lib/alerts.Transaction duration anomaly alert doesn't send alert anomaly is less than thresholdStandard Out
Stack Trace
and 11 more failures, only showing the first 3. Metrics [docs]Module Count
Async chunks
Page load bundle
Unknown metric groupsAPI count
API count missing comments
API count with any type
History
To update your PR or re-run it, just comment with: |
Have we identified specific fields that need to be different? The downside being that we're introducing another index, but we can use aliases to query across them when needed, so the benefit might outweigh the drawback here. 🤔 |
Index threshold and ES query would actually play nicely together in the same index because both are threshold dependent. The tracking containment alert is the odd one out, where it lives inside stack rules so would presumably be written to the StackRulesRegistry but would contain geo specific information. If we assume that we might have more stack rules like that, we could end up with a stack rules field mapping that is a superset of a bunch of field mappings that are only used by one rule type. Maybe that's not such a big deal though if they are all ECS compliant? I know there's been a discussion about having all ECS fields in the base field mapping so if that's the direction we're veering, then this is not such a problem since the field mapping will be filled with fields that are unused. |
That's a fair question, but I don't think we want to box ourselves into that corner just yet. |
Resolves #98319
Summary (WIP)
Use rule registry to write alerts-as-data for Index Threshold and ES Query rule types. Using terminology from the Alerts as Data Schema Definition issue, I tried to determine what, if anything, to write out as
alert
(signal) data andmetric
(evaluation) data for these two rule types.Initially, I used the existing
CreateLifecycleRuleType
to write lifecycle data for these two rule types. Index threshold and ES query are similar in that they both specify a threshold condition. During each execution cycle, the condition is evaluated, which can generate ametric
document. When the condition is met, the rule becomes active and will stay active if the condition continues to be met in subsequent rule executions. When the condition is not met, the rule is considered recovered. Eachactive
orrecovered
alert can generate analert
document. Grouping the active alerts with a UUID as in the lifecycle rule makes sense as well. I extended the functionality of theCreateLifecycleRuleType
with aCreateThresholdRuleType
that is very similar, but allows differentstatus
andaction
constants (active/recovered
vsopen/closed
) and two additionalTAdditionalRuleExecutorServices
for writing outmetric
andevent
documents.Index Threshold
Desired data
event.kind: metric - A metric document is written during each rule execution, for each alert id. Contains the numeric value that is evaluated against the condition for this rule. *Should this include the threshold and comparator from the rule params? Could also include more information like description of field this is "avg of cpu.pct" *
Example metric document
event.kind: alert - An alert document is written out each time the condition being evaluated during rule execution is true. A single
recovery
alert document is written out when the condition evaluation changes fromtrue
tofalse
. This is the "mutable" doc if we are making docs mutable, so in the future, instead of a series ofalert
documents with the samekibana.rac.alert.uuid
and a series of statuses:active
,active
,active
,active
,recovered
, this might be a single document with akibana.rac.alert.uuid
, a start and end date and a duration?Example active alert document
Example recovery alert document
ES Query (WIP)
Desired data
true
for the alerts-as-data indices so any field that is non-ECS compliant will currently error and cause the data not to be indexed.What could an alerts-as-data view look like
This shows when the alert was triggered and resolved for each alert id in the rule.
Questions/Thoughts
event.action
andkibana.rac.alert.status
in theCreateLifecycleRuleType
, for example).alert
andmetric
data, I created factories with service functions for each of these types that the rule executor could call to write specific data. Is this the right. For example,createThresholdRuleTypeFactory
has analertWithThreshold
and ametricWithThreshold
callback. If these are meant to be generic, framework level helpers, it might be useful to be able to have "alert data factories" and "metric data factories" and be able to compose them in different ways in order to reuse them.signal
andmetric
documents may look very different from each other and be used in different ways. I have registered a singlestack-alerts
type of rule registry, which bootstraps a.kibana-alerts-stack-alerts*
index but it might make sense to have separate indices for.kibana-alerts-index-threshold*
and.kibana-alerts-es-query
data indices as well. Rather than creating a IndexThresholdRuleRegistry & an EsQueryRuleRegistry, it might be nice to create a StackAlertsRuleRegistry but bootstrap multiple data indices.CreateLifecycleRuleType
resembles what is happening in the alerting TaskRunnerexecuteAlertInstances
function wrt to figuring out which alerts are new/active/recovered. New things that might be nice to have in the framework. Not sure if this will be true for all rule registry types.CreateLifecycleRuleType
copies the latest active document for a recovered alertalert.uuid
to track a grouping of active alerts