Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logs UI] Log threshold ratio alerts #76867

Merged
merged 11 commits into from
Sep 29, 2020

Conversation

Kerry350
Copy link
Contributor

@Kerry350 Kerry350 commented Sep 7, 2020

Summary

This primarily closes #72648, it also closes #73274 and closes #72453. It adds the ability to create ratio based alerts for the log threshold alert type.

Screenshot 2020-09-07 at 11 11 52

Implementation notes

  • I've stuck with the mathematics principle that dividing by 0 isn't possible, as such if either value is 0 then a ratio is undefined / indeterminate. Given an undefined ratio the alert will not fire.

  • Ratio alerts are, realistically, what we have already but two sets of criteria compared against each other, as such a lot of the executor logic is reused, but with new ratio results processors and context variables.

  • Threshold annotation rendering is turned off for ratio alerts on chart previews.

  • I've moved around some folders in preparation for us adding more alert types (e.g. the anomaly alert type). The diff seems to have marked some (not all) of the renames as new files (validation, threshold component etc) so it looks like more code has been added there than in reality.

Testing

  • Alerting requires SSL (yarn start --ssl and yarn es snapshot --ssl)

  • Quite a lot has been touched, so it would be good to not only verify the new ratio alerts, but also standard count based alerts. Things like UI validation should try to be broken etc.

@Kerry350 Kerry350 added release_note:enhancement v8.0.0 Feature:Logs UI Logs UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.10.0 labels Sep 7, 2020
@Kerry350 Kerry350 added this to the Logs UI 7.10 milestone Sep 7, 2020
@Kerry350 Kerry350 requested a review from a team September 7, 2020 11:42
@Kerry350 Kerry350 self-assigned this Sep 7, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

Copy link
Member

@jasonrhodes jasonrhodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm taking a look at this PR this week 👍

Copy link
Member

@jasonrhodes jasonrhodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! I'm going to take a look at the code more deeply next, but I'm going to add UX based comments individually so far.

@jasonrhodes
Copy link
Member

Screen Shot 2020-09-09 at 1 38 03 PM

Missing some padding/separation between these sections now, I think.

@jasonrhodes
Copy link
Member

Screen Shot 2020-09-09 at 1 38 08 PM

The loading indicator when opening a new alert is up in the top left corner and not centered (or padded from the edge). Not sure if this is new or not, though.

@jasonrhodes
Copy link
Member

Screen Shot 2020-09-09 at 2 24 57 PM

I assume this means "more than 75%"? We probably need the % indicator to help with that understanding, both in the form and in the display here. Or use 0.75? Something to help me understand what that number is?

@jasonrhodes
Copy link
Member

Screen Shot 2020-09-09 at 2 25 20 PM

At first I was admittedly pretty lost with what is going on here. It took me a while to realize that when I select "ratio" from the "When" dropdown, I get two separate condition "sets" added and that I can now add and remove conditions to each set which are then evaluated together as a ratio. I'm not sure I have a ton of great ideas to fix this, but as far as a small initial tweak maybe indentation would help? I do think this type of SQL UX may be starting to break down a bit, though, more generally.

@katrin-freihofner do you have thoughts here?

@Kerry350
Copy link
Contributor Author

Missing some padding/separation between these sections now, I think.

Yeah 👍

The loading indicator when opening a new alert is up in the top left corner and not centered (or padded from the edge). Not sure if this is new or not, though.

This is old, I think this was originally due to the fact we don't really control the initial rendering space, the framework runs our expression code and pops it in there. Things may have changed by now, and positioning that spinner might be easier.

I assume this means "more than 75%"? We probably need the % indicator to help with that understanding, both in the form and in the display here. Or use 0.75? Something to help me understand what that number is?

This is the straight up ratio value, as it's the ratio of the count of Case A to the count of Case B. Or CaseA / CaseB. Admittedly 75 isn't a great default, it's what the count type uses, something like 2 would be better for twice as many.

It could be converted to percentages.

I went with the actual ratio values as that's what other ratio alerting products seemed to use. With ratios I think most people are accustomed to the 3:1 (numerator:denominator) format, and therefore plugging in a real value.

We could make it clear that this is CaseA / CaseB, here is an example from elsewhere:

Screenshot 2020-09-10 at 12 35 18

At first I was admittedly pretty lost with what is going on here. It took me a while to realize that when I select "ratio" from the "When" dropdown, I get two separate condition "sets" added and that I can now add and remove conditions to each set which are then evaluated together as a ratio. I'm not sure I have a ton of great ideas to fix this, but as far as a small initial tweak maybe indentation would help? I do think this type of SQL UX may be starting to break down a bit, though, more generally.

I don't have any real ideas here. I did a literal implementation of what was in #72648, which was When ratio of the count of log entries with alert condition1 AND alert condition2 to the count of log entries with alert condition3 AND alert condition4 is threshold within the last 5 minutes then action., when I spoke with @katrin-freihofner it was agreed there'd be no specific design for this, and what was in that ticket was enough.

@katrin-freihofner
Copy link
Contributor

@jasonrhodes I'm not sure I fully understand the problem. And @Kerry350 did not get any design help as we thought this can move directly to engineering.
From what I see in the screenshots it is what was asked for but I agree, it looks confusing.
I'm wondering if the new expression layout could help here. This is an example:
Screenshot 2020-09-16 at 12 14 19

Also, I think we need to move the chart to the bottom. That said, this is something I would really like to do across Observability and not application by application. Let me know what you are thinking.

Copy link
Member

@jasonrhodes jasonrhodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see in the screenshots it is what was asked for but I agree, it looks confusing.
I'm wondering if the new expression layout could help here.

@katrin-freihofner Can we talk about this option at Monday's meeting? This is the kind of solution I'm looking for, for right now -- just something to help orient the user to how these conditions are grouped/organized. We can re-think beyond that after this ticket, if needed.

Also, I think we need to move the chart to the bottom. That said, this is something I would really like to do across Observability and not application by application. Let me know what you are thinking.

Yeah, this topic has been coming up a lot, and there are valid reasons to keep it where it is but there is also a lot going on with these condition rows. Can we schedule a session at some point next week to talk about Alerting Design overall? I don't think it's urgent but it would be good to start thinking through whether the SQL-style design is going to continue to work or if we need something else, and I think the per-condition previews are one aspect of many there. I'll put something on the calendar and include some folks.

@jasonrhodes
Copy link
Member

This is old, I think this was originally due to the fact we don't really control the initial rendering space, the framework runs our expression code and pops it in there. Things may have changed by now, and positioning that spinner might be easier.

If it's not super simple, we can address this in another ticket. Not a huge deal.

@jasonrhodes
Copy link
Member

This is the straight up ratio value, as it's the ratio of the count of Case A to the count of Case B. Or CaseA / CaseB. Admittedly 75 isn't a great default, it's what the count type uses, something like 2 would be better for twice as many.

It could be converted to percentages.

I went with the actual ratio values as that's what other ratio alerting products seemed to use. With ratios I think most people are accustomed to the 3:1 (numerator:denominator) format, and therefore plugging in a real value.

Ahh ok this makes sense. Changing the default should remove the confusion I had. Sounds good.

@Kerry350
Copy link
Contributor Author

@jasonrhodes This is ready for another look, I've addressed all of the feedback here. Recap on the main bits:

  • New design is in place

Screenshot 2020-09-24 at 14 52 48

  • Used a default of 2 vs 75 for ratio alert thresholds
  • Removed all type casting in favour of user-defined type guards

A note on the loading spinner: this is actually the spinner from the alerting framework, and not something we control. The spinner we control is this one (which is when we're loading the fields information, if needed):

Screenshot 2020-09-24 at 12 23 44

So I haven't changed anything there.

@Kerry350
Copy link
Contributor Author

@elasticmachine merge upstream

@Kerry350
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

@kbn/optimizer bundle module count

id value diff baseline
infra 1126 +1 1125

async chunks size

id value diff baseline
infra 4.2MB +11.3KB 4.2MB

page load bundle size

id value diff baseline
infra 137.7KB +3.2KB 134.5KB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Member

@jasonrhodes jasonrhodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @Kerry350 ! Thanks so much for working with us on getting this ready to merge!

@jasonrhodes jasonrhodes merged commit 33d051b into elastic:master Sep 29, 2020
jasonrhodes added a commit that referenced this pull request Sep 29, 2020
* Add ratio alerting to log threshold alerts

* Fix i18n

* Move grouped query must not filtering from outer to inner clause

* Use new ratio alerting layout

* Use better defaults for ratio alerts

* Remove div wrapper

* Remove type casting, use user-defined type guards

Co-authored-by: Elastic Machine <[email protected]>

Co-authored-by: Kerry Gallagher <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Logs UI Logs UI feature release_note:enhancement Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.10.0 v8.0.0
Projects
None yet
5 participants