Uptime - Enable Suites Alerts #123969

dominiqueclarke · 2022-01-27T19:25:44Z

As an Uptime user, I want to be notified when I have suites errors, to ensure that all the monitors I expect to run are actually running.

Background

As part of #123426, individual journeys in a suite will be identified during Heartbeat's discovery phase, and reported the Uptime as suite documents with a list of individual journeys.

If there is an error reported on a journey, that journey will not run, but the error will be displayed on the Suites tab.

AC

User should be able to create an alert to notify on new Suites error documents.

TBC: How do we want to handle configuration for this? Will it be a standard alert criterion that we choose, or will we allow the user to have some customization of the alert criteria or thresholds like Monitor Status alerts?

elasticmachine · 2022-01-27T19:27:33Z

Pinging @elastic/uptime (Team:uptime)

dominiqueclarke · 2022-03-30T13:31:35Z

@paulb-elastic We've shifted strategy a lot on suites, and I'm not sure if this is still relevant.

The push command itself seems like the appropriate place to report back configuration errors to the user in most cases, preventing them from pushing the configuration entirely and failing that CI step.

For errors that are not caught, we do have elastic/uptime#439 in place, which will display errors when the journey cannot run. What we don't have is an alternating system for this. Should this still be prioritized for 8.3?

cc: @andrewvc @vigneshshanmugam

paulb-elastic · 2022-03-30T16:38:43Z

I agree, this no longer seems to fit in with the new push approach.

I've made a comment here about error reporing from running push.

What we don't have is an alternating system for this. Should this still be prioritized for 8.3?

I'm not sure I understand what this refers to. Do you mean that we need to extend elastic/uptime#439 to show errors for externally generated (suite) monitors, not just inline ones?

dominiqueclarke · 2022-03-30T16:53:56Z

@paulb-elastic As far as I understand, @shahzad31 correct me if I'm wrong, "inline" is a bit of a misnomer for elastic/uptime#439 as the implementation picks up on error documents generated by heartbeat. These errors should be generated when a journey fails to run, regardless of what the source is. So the existing error behavior found in elastic/uptime#439 should "just work" with monitors created via push.

That being said, I think we're at a lower risk of ever having monitors with configuration errors be pushed, since the push command could evaluate the code and report any syntax errors ahead of time. Indeed, in a suites model, the engineering team will also likely be utilizing eslint to its fullest. The biggest error we discussed with suites was two monitors having the same id or the same name, and I would think the push command would report that back to the user before its push.

What we don't have is an alternating system for this. Should this still be prioritized for 8.3?

I'm realizing this was kinda misspoken on my part. Because the journeys that failed to run still generate a summary document, they are considered down, and we already have alerting for down state.

I think it's a product question if our alerting should be more sensitive to the differences in "down" states, perhaps we can filter out down states for journeys that failed to run, and then create a new alert for journeys that failed to run. I'll leave that up to @drewpost to determine whether or not we want to look into this type of alerting for Uptime, or punt to Synthetics.

paulb-elastic · 2022-03-30T17:30:16Z

Thanks for the clarification @dominiqueclarke

I am thinking about...

Because the journeys that failed to run still generate a summary document, they are considered down, and we already have alerting for down state

... I am trying to work out what the push is doing to evaluate the correctness of the journey, to work out if a monitor could be pushed and saved, but can't execute (e.g. syntax error, but I wonder if there are other cases)?

If this does fail to execute, and is marked as down is that correct anyway (i.e. it would trigger an alert that "your website is down", which is incorrect). That's what this issue was originally about, being able to have a different type of alert (quite possibly notifying a different team that there's a problem).

andrewvc · 2022-03-31T02:15:51Z

I agree that this issue is no longer relevant. I'm going to close this one so we can move the discussion to #123928 and elsewhere.

The push command does not and cannot evaluate the correctness of the journey beyond it:

Being valid javascript/typescript
Having at least the journey and step syntax more or less correct.

The push approach really reduces the surface area of possible errors, so that's generally good news. It already, by nature of how it works, handles all the cases this would have handled, at least in any initial implementation.

If there's any disagreement re: closing this let feel free to re-open or ping in slack

botelastic bot added the needs-team Issues missing a team label label Jan 27, 2022

dominiqueclarke added the Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability label Jan 27, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 27, 2022

dominiqueclarke added enhancement New value added to drive a business result needs-team Issues missing a team label labels Jan 27, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 27, 2022

dominiqueclarke mentioned this issue Jan 27, 2022

Duplicate journey id, silently groups synthetic pings under the last journey in list elastic/synthetics#442

Closed

andrewvc closed this as completed Mar 31, 2022

lucasfcosta mentioned this issue Apr 25, 2022

[META] Push command elastic/synthetics#494

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uptime - Enable Suites Alerts #123969

Uptime - Enable Suites Alerts #123969

dominiqueclarke commented Jan 27, 2022

elasticmachine commented Jan 27, 2022

dominiqueclarke commented Mar 30, 2022

paulb-elastic commented Mar 30, 2022

dominiqueclarke commented Mar 30, 2022

paulb-elastic commented Mar 30, 2022

andrewvc commented Mar 31, 2022

Uptime - Enable Suites Alerts #123969

Uptime - Enable Suites Alerts #123969

Comments

dominiqueclarke commented Jan 27, 2022

Background

AC

elasticmachine commented Jan 27, 2022

dominiqueclarke commented Mar 30, 2022

paulb-elastic commented Mar 30, 2022

dominiqueclarke commented Mar 30, 2022

paulb-elastic commented Mar 30, 2022

andrewvc commented Mar 31, 2022