-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uptime - Enable Suites Alerts #123969
Comments
Pinging @elastic/uptime (Team:uptime) |
@paulb-elastic We've shifted strategy a lot on suites, and I'm not sure if this is still relevant. The push command itself seems like the appropriate place to report back configuration errors to the user in most cases, preventing them from pushing the configuration entirely and failing that CI step. For errors that are not caught, we do have elastic/uptime#439 in place, which will display errors when the journey cannot run. What we don't have is an alternating system for this. Should this still be prioritized for 8.3? |
I agree, this no longer seems to fit in with the new I've made a comment here about error reporing from running
I'm not sure I understand what this refers to. Do you mean that we need to extend elastic/uptime#439 to show errors for externally generated (suite) monitors, not just inline ones? |
@paulb-elastic As far as I understand, @shahzad31 correct me if I'm wrong, "inline" is a bit of a misnomer for elastic/uptime#439 as the implementation picks up on error documents generated by heartbeat. These errors should be generated when a journey fails to run, regardless of what the source is. So the existing error behavior found in elastic/uptime#439 should "just work" with monitors created via push. That being said, I think we're at a lower risk of ever having monitors with configuration errors be pushed, since the push command could evaluate the code and report any syntax errors ahead of time. Indeed, in a suites model, the engineering team will also likely be utilizing eslint to its fullest. The biggest error we discussed with suites was two monitors having the same id or the same name, and I would think the push command would report that back to the user before its push.
I'm realizing this was kinda misspoken on my part. Because the journeys that failed to run still generate a summary document, they are considered down, and we already have alerting for down state. I think it's a product question if our alerting should be more sensitive to the differences in "down" states, perhaps we can filter out down states for journeys that failed to run, and then create a new alert for journeys that failed to run. I'll leave that up to @drewpost to determine whether or not we want to look into this type of alerting for Uptime, or punt to Synthetics. |
Thanks for the clarification @dominiqueclarke I am thinking about...
... I am trying to work out what the If this does fail to execute, and is marked as |
I agree that this issue is no longer relevant. I'm going to close this one so we can move the discussion to #123928 and elsewhere. The
The push approach really reduces the surface area of possible errors, so that's generally good news. It already, by nature of how it works, handles all the cases this would have handled, at least in any initial implementation. If there's any disagreement re: closing this let feel free to re-open or ping in slack |
As an Uptime user, I want to be notified when I have suites errors, to ensure that all the monitors I expect to run are actually running.
Background
As part of #123426, individual journeys in a suite will be identified during Heartbeat's discovery phase, and reported the Uptime as suite documents with a list of individual journeys.
If there is an error reported on a journey, that journey will not run, but the error will be displayed on the Suites tab.
AC
User should be able to create an alert to notify on new Suites error documents.
TBC: How do we want to handle configuration for this? Will it be a standard alert criterion that we choose, or will we allow the user to have some customization of the alert criteria or thresholds like Monitor Status alerts?
The text was updated successfully, but these errors were encountered: