-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APM] Alerts for throughput and failure rate anomalies #159288
Comments
Pinging @elastic/apm-ui (Team:APM) |
@elastic/apm-pm do you think that these should be separate rule types or just the one we currently have that will alert on latency, throughput and failed transaction rate? |
@sqren I agree, the name and description are misleading. We have plans to add the Anomaly rule (currently only available in Stack management) to Observability. Do you know why there is a separate "APM anomaly" rule and how it is different from the one in Stack management? I think as the ML job covers latency, throughput, and failure rates the Anomaly detection rule in Stack management would be able to alert on all three. |
Pinging @elastic/actionable-observability (Team: Actionable Observability) |
@gbamparop - I would suggest that we re-use and extend the capabilities of the existing APM anomaly rule. We should make sure that the threshold settings/ranges are appropriate for the new metrics. |
Agree, that's also what I've suggested in the issue description: "Instead of creating new rules the existing ApmRuleType.Anomaly rule should be updated to also produce alerts for other types of anomalies than latency"
Actually, we don't even need to think of this. The only metric the rule cares about is severity. Meaning a severity like "critical" can apply to both latency anomalies, throughput anomalies and failure rate anomalies. |
Today we allow users to create anomaly detection jobs (ML Jobs) which will produce anomaly results for latency, throughput and failure rates.
Users can create rules and be alerted when there are anomalies for latency but they have no way of doing the same for throughput and failure rate anomalies.
There is a ruled called
ApmRuleType.Anomaly
and the user facing description for this rule is:This is quite misleading because it does in fact not produce alerts for throughput or failed transaction rate. Only latency as can be seen in the terms filter below:
kibana/x-pack/plugins/apm/server/routes/alerts/rule_types/anomaly/register_anomaly_rule_type.ts
Lines 172 to 175 in 7890be6
Solution
It should be possible to receive alerts for throughput and failure rate anomalies. Instead of creating new rules the existing
ApmRuleType.Anomaly
rule should be updated to also produce alerts for other types of anomalies than latency.Related enhancement request: https://github.com/elastic/enhancements/issues/12409 (internal)
The text was updated successfully, but these errors were encountered: