Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML Detection of Duration Anomalies #61348

Closed
andrewvc opened this issue Jul 24, 2019 · 13 comments
Closed

ML Detection of Duration Anomalies #61348

andrewvc opened this issue Jul 24, 2019 · 13 comments
Assignees
Labels
enhancement New value added to drive a business result Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability test-plan test-plan-ok issue has passed test plan v7.7.0

Comments

@andrewvc
Copy link
Contributor

This issue is to track adding ML support to our duration charts on the monitor details page. This is a great way to start integrating ML into uptime. We'd like to start showing:

  1. The baseline average duration
  2. Highlight spikes and drops in response times as anomalous. We should consider these warnings not errors for this initial MVP.

Open questions:

Do we show these as warnings or info? Visually, do we communicate this with a yellow or more neutral color?

Implementation Notes
Check with APM & SIEM ML integrations on how they:

  • Check for License, recommend Trial if not already Trial or Platinum
  • Check for resources to run ML
  • Enable ML (with set of ML jobs)
  • Provide error messages if too much is asked
  • Stop ML (and delete ML Jobs)

@katrin-freihofner this might be good to add to our mocks for our redesigned monitor details page.

@andrewvc andrewvc changed the title ML Anomaly Mappings ML Duration Anomalies Aug 8, 2019
@andrewvc andrewvc changed the title ML Duration Anomalies ML Detection of Duration Anomalies Aug 8, 2019
@grabowskit
Copy link

Sample of SIEM/ML UI integration...
Screenshot 2019-08-08 14 01 51
Sample of APM/ML UI integration...
Screenshot 2019-08-08 14 06 37

@grabowskit
Copy link

Example of APM chart with integrated ML results...

image

@katrin-freihofner
Copy link
Contributor

We have something very similar for the logs UI planned:

Screenshot 2019-10-08 at 16 20 17

I think this could work for uptime too.

@katrin-freihofner
Copy link
Contributor

This is how it could turn out for the duration chart:

Screenshot 2019-10-08 at 16 37 17

@Titch990
Copy link
Contributor

Titch990 commented Oct 8, 2019

@katrin-freihofner I just want to say that all these look great!

@Titch990
Copy link
Contributor

Titch990 commented Oct 8, 2019

I also have a couple of comments about the UI text in some of the earlier screenshots above. I'm not sure how far down the line these changes are, and hence whether it is appropriate to comment yet. Also, I wonder who is a good person to raise these points with initially? @gchaps perhaps?

Point 1: I think the text in both the SEIM Anomaly detection settings dialog, and the APM Enable anomaly detection dialog could be tightened up a bit. I'm happy to help with this.

Point 2: I'm a bit concerned about the use of the word "Integrations" in the APM/ML UI integration. I may be worrying needlessly, and perhaps the term has already been agreed, but I'd we also already have a different kind of "Integration" in Observability. This other "Integration" will appear in the UI and documentation shortly and may cause confusion.

This other integration is an integration with a third party service, for example, GCP, Docker, MySQL etc. It refers to the mechanism by which we set up (or integrate with) a new data source to deliver logs and metrics data. This usage of "integration" seems to be fairly standard across many third party vendors, not just us.

So in the "Sample of APM/ML UI integration" screenshot above, it's possible that the user may expect the other kind of Observability "integration" rather than what I think is an integration with our machine learning app. I think "Integration" is a very generic term, so perhaps it may be better to choose a more specific term that focuses on what kind of integration this is, or what problem the integration solves for the user, for example "ML integrations" or "Anomaly detection". I think in the Logs app, the Machine learning integration is on a tab called "Analysis", so perhaps that's something else to consider and use consistently across the Observability apps?

@drewpost
Copy link

This is how it could turn out for the duration chart:

Screenshot 2019-10-08 at 16 37 17

@katrin-freihofner In this example with multiple series, how would the user know which series that anomaly highlighting pertains to?

@katrin-freihofner
Copy link
Contributor

@drewpost like discussed, these (red and yellow) indicators are suggesting that there is an anomaly. Similar to the Logs UI, there needs to be a tooltip and a button to drill-down to the ML view for further details.

@shahzad31 shahzad31 assigned shahzad31 and unassigned shahzad31 Feb 10, 2020
@katrin-freihofner
Copy link
Contributor

@katrin-freihofner
Copy link
Contributor

Design issue

@andrewvc andrewvc transferred this issue from elastic/uptime Mar 25, 2020
@andrewvc andrewvc added enhancement New value added to drive a business result Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability test-plan v7.7.0 labels Mar 25, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/uptime (Team:uptime)

@andrewvc andrewvc self-assigned this Mar 30, 2020
@andrewvc
Copy link
Contributor Author

Fixed in #59785

@andrewvc andrewvc added the test-plan-ok issue has passed test plan label Mar 30, 2020
@andrewvc
Copy link
Contributor Author

Passed test plan perfectly. Seemed to detect anomalies. Creation / linking / deletion of jobs went smoothly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability test-plan test-plan-ok issue has passed test plan v7.7.0
Projects
None yet
Development

No branches or pull requests

7 participants