Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

These fields are missing from the Uptime Alerting pick list #66893

Closed
DanRoscigno opened this issue May 18, 2020 · 6 comments
Closed

These fields are missing from the Uptime Alerting pick list #66893

DanRoscigno opened this issue May 18, 2020 · 6 comments
Labels
bug Fixes for quality problems that affect the customer experience Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability

Comments

@DanRoscigno
Copy link
Contributor

Kibana version:
7.7.0

Elasticsearch version:
7.7.0

Server OS version:
ESS

Describe the bug:
These fields are missing from the Alerting UI within Uptime:

monitor.name
monitor.type
observer.geo.name
observer.hostname
url.full
url.path
error.message

Expected behavior:
I should be able to produce an alert with information like this:

SpringToDoApp is down from heartbeat@us-east-1a. The error message is Post http://roscigno-obs:8080/demo/add connect: connection refused.

from data like this:

image

@DanRoscigno DanRoscigno added the Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability label May 18, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/uptime (Team:uptime)

@DanRoscigno DanRoscigno added the bug Fixes for quality problems that affect the customer experience label May 21, 2020
@shahzad31 shahzad31 self-assigned this May 25, 2020
@shahzad31
Copy link
Contributor

shahzad31 commented May 26, 2020

@DanRoscigno thanks for opening this, i have a question about the behaviour

SpringToDoApp is down from heartbeat@us-east-1a. The error message is Post http://roscigno-obs:8080/demo/add connect: connection refused.

This will work nice, if there is only one monitor selected for that alert or there is only one monitor down, this will become confusing in case multiple monitors are down per alert.

I think there are two ways to handle this, one is to generate separate message for each down monitor, we shouldn't combine multiple down monitors in a single alert.

Down monitors: always-down, auto-icmp-0X24948F467C6C4F01, badhost-badssl... and 3 other monitors
Last triggered at: 2020-05-25T15:02:57.875Z
always-down from Unnamed location; auto-icmp-0X24948F467C6C4F01 from Unnamed location; badhost-badssl from Unnamed location; expired-badssl from Unnamed location; gmail-smtp from Unnamed location; google-dns from Unnamed location;

Other way would be to generate multiple lines in each alert message for each monitor. I am interested in hearing your thoughts about this.

cc @justinkambic @drewpost @andrewvc

@DanRoscigno
Copy link
Contributor Author

DanRoscigno commented May 26, 2020

Hi @shahzad31 , in my experience there are two use cases for basic uptime monitoring:

  • Is a service down from a subset of locations
  • Is a service down from all locations
  • Is there a larger problem

(where location == end-user geography)

Here is how I think about this in more detail:

  • Having individual alerts for each service at each location tells me if the service is down from a subset of locations.
  • Having multiple alerts for a single service (one alert for each location) lets me know that either there is a widespread infrastructure issue (e.g., network is down to several locations).
  • Having multiple alerts (from multiple services at a single location this time) lets me know that a shared resource is down (e.g., power at a datacenter, network at a datacenter)
  • Having multiple alerts from multiple locations tells me that a resource shared by multiple locations is down (e.g., WAN, shared database, etc.)

The alert and actions tool in Kibana has a create alert per feature (under the Metric Threshold tab) that I think might be very useful, in fact I think it would be great if we could have two or more create alert per entries. In the above use cases I would want one per service.id and one per observer.geo.name.

I just realized that I had not asked for service.id. I think that service.id and service.name should be considered two very important fields in all observability tabs/apps/pages in the solution. If I were still in Ops I would not be interested in an alert from a server or a switch, but I sure would be interested in an alert related to the switch port that a critical server related to the online banking service was plugged into.

I am happy to talk to y'all any time about this.

@shahzad31 shahzad31 removed their assignment Jun 4, 2020
@andrewvc
Copy link
Contributor

I'd like to close this in favor of elastic/uptime#237 if you're OK with it @DanRoscigno. It wraps up some related concerns with the ones raised here.

@DanRoscigno
Copy link
Contributor Author

Thanks @andrewvc sounds good.

@shahzad31
Copy link
Contributor

will be resolved in PR #74659

@zube zube bot added [zube]: Done and removed [zube]: Ready labels Aug 11, 2020
@zube zube bot removed the [zube]: Done label Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability
Projects
None yet
Development

No branches or pull requests

4 participants