Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM] Alerting use cases and examples #103785

Open
sorenlouv opened this issue Jun 29, 2021 · 11 comments
Open

[APM] Alerting use cases and examples #103785

sorenlouv opened this issue Jun 29, 2021 · 11 comments
Labels
apm:alerting Team:APM All issues that need APM UI Team support

Comments

@sorenlouv
Copy link
Member

sorenlouv commented Jun 29, 2021

1. Alerting on Garbage collection

Ability to create alerts for garbage collection metrics.
Source

Another similar request:

I would like to add that it would also also be useful to be able to create latency threshold & error count alerts at the individual transaction granularity.

For example, we have a requirement that each request must process server side less than a 0.5sec
Very often in an application there can be 50+ different API endpoints (transactions)
And in case only 1 is slow (avg 1sec), the remaining 49 successes (avg 0.2sec) will show the average success.

Also requested in #86108, #134481

2. Ability to create rules for multiple services (but not all) at a time

It would be nice if we could create and manage alerts for multiple services (but not every service).

#104886

3. Alerts on dependencies (#16724, #166309)

The customer wants to create an APM alert based on a dependency's latency (like Redis or Elasticsearch itself) instead of the entire service's latency.

4. Alerts for for throughput and failure rate anomalies (#159288)

Customer thinks the request to have Alerts on an ML job looking for increased error rates makes perfect sense. They say it should become part of an out-of-the-box experience.

https://github.com/elastic/enhancements/issues/12409

5. Add KQL filtering to APM rules

It should be possible to add custom filtering using kql. This has been request by users and will align APM rules with other Observability rules.

@sorenlouv sorenlouv added [zube]: Inbox apm:alerting Team:APM All issues that need APM UI Team support and removed [zube]: Inbox labels Jun 29, 2021
@sorenlouv sorenlouv changed the title [APM] Alerting on garbage collection [APM] Alerting use cases and examples Aug 18, 2021
@botelastic
Copy link

botelastic bot commented Feb 14, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added the stale Used to mark issues that were closed for being stale label Feb 14, 2022
@acrewdson
Copy link
Contributor

+1 for

  1. Alert for specific transaction name

I think this could be a valuable feature for many teams using APM. As noted above, in a typical API, response times for certain endpoints can be more critical than others, and it would be nice for latency-based alerts in APM to be capable of representing this. Being able to alert on latency in a more granular way, by targeting specific transactions, would be really helpful.

@botelastic botelastic bot removed the stale Used to mark issues that were closed for being stale label Feb 15, 2022
@sorenlouv
Copy link
Member Author

Thanks for chiming in @acrewdson! I'll make sure to include your feedback and will try to get this on the roadmap.

@bradleydamato
Copy link

Hey team, has this item been roadmapped? If not, are there any workarounds to enable this functionality? Specifically, I'm looking to create latency threshold alerts for specific transactions (rather than at the service level).

@chrisdistasio
Copy link

Alert on Dependency metrics

@e-parth-pathak
Copy link

@sqren Adding the use case of a customer by anonymising the customer data. I have replaced the names.

USE CASE:

  1. Customer should be able to configure an Anomaly Detection alert on a single transaction out of an APM service.
    For example in customer's environment, we have a Java APM agent who collecting metrics and is writing the data into the service some-java-service-name.
  2. In this service we have a transaction: sometransaction#name
  3. They can configure an alert an ML Anomaly job on the entire service.

OBSERVATIONS:

  1. Right now, Machine learning jobs are taking into consideration data points from the apm-* data view in kibana. In this, the data is being fetched on the basis of entire collection of apm-* indices.
  2. We need to have more fine tuned and granular filters, so that, from a certain service, only certain transactions can be monitored for detection of Anomaly.
  3. We tried creating a custom data view selecting only apm-7.13.2-transaction-* indices but, this is not fetching the data based on transactions.

@chrisdistasio
Copy link

chrisdistasio commented Feb 24, 2023

For discussion, from Slack thread:

Anna Maria Modée
Hi team!
Got a question today which highlights how we’ve been working in silos (that was discussed in the call previously):
In the Security roadmap, we are planning an alert (probably EQL based) to identify missing events. This can also be useful in an observability use case and it is actually being asked by NNIT. https://github.com/elastic/security-team/issues/2835
Wouldn’t we also be able to use this in Observability?

@sorenlouv
Copy link
Member Author

Removing the following items from the list:

  • Ability to alert when any individual error group exceeds the threshold (group by error.grouping_key)
  • Ability to create rule for specific transaction group (filter on transaction.name)

These were implemented in #154241, #155405 and #155410 and shipped in 8.8 🎉

@hp0620
Copy link

hp0620 commented Sep 11, 2023

Hello, @sqren,

Do we have anything on the roadmap for:

  1. Alerts on dependencies (https://github.com/elastic/enhancements/issues/16724)
    The customer wants to create an APM alert based on a dependency's latency (like Redis or Elasticsearch itself) instead of the entire service's latency.

I have another customer asking for the availability on this feature and was wondering if there're any updates we can share.

Thank you.

@sorenlouv
Copy link
Member Author

sorenlouv commented Sep 12, 2023

@hp0620 I've created a dedicated issue to track this: #166309. Do you have any more details around the use case that I can add?

@hp0620
Copy link

hp0620 commented Sep 15, 2023

Thanks @sqren for creating a separate issue to track.

Customer shared the use case with us:

One of the application development teams would like to track/monitor the latency on an mssql dependency for their service rather than monitoring latency at the service or transaction level. This would provide value by allowing them to track poor performing queries.

Hope this helps you better understand the use case behind the request. Let me know if you need anything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apm:alerting Team:APM All issues that need APM UI Team support
Projects
None yet
Development

No branches or pull requests

7 participants