Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metrics Alerts] Create Metric Threshold Alert Type and Executor #57606

Merged
merged 24 commits into from
Feb 27, 2020

Conversation

Zacqary
Copy link
Contributor

@Zacqary Zacqary commented Feb 13, 2020

Summary

Closes #56427

This is a merge and update of some of the components of #47165

Adds an alert type and executor function for metric threshold alerts. Only supports the Fired action group (not Recovered), as multiple action groups won't be available until 7.8.

The new Alerting APIs support renotify intervals, retrieving current alert states, and other features that were previously missing in the POC, so those are now removed. The executor function is now a relatively simple check of the current metric value.

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@Zacqary Zacqary added release_note:enhancement Feature:Alerting Feature:Metrics UI Metrics UI feature v8.0.0 Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.7.0 labels Feb 13, 2020
@Zacqary Zacqary requested a review from a team February 13, 2020 18:45
@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

@Zacqary Zacqary marked this pull request as ready for review February 13, 2020 18:46
@Zacqary
Copy link
Contributor Author

Zacqary commented Feb 13, 2020

@elasticmachine merge upstream

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick review after @mikecote pointed this PR to me.

Looks good so far, made some comments, obviously hit us up for any questions.

Two relevant PRs ...

The first is the alerting plugin migration from legacy to new platform. Your plugin setup will change a bit. This will likely get merged to master by next week (if not sooner).

The second is the initial step in our first alertType that we will provide built-in with the alerting plugin. It's fairly similar to this one. I've been expecting that other consumers of the alerting plugin will have similar kinds of alertTypes, perhaps over time we can share some implementation code. For now, you might want to take a peek to get some additional ideas, like providing TONS of context properties when scheduling actions. :-)

indexPattern: schema.string(),
}),
},
actionGroups: [FIRED_ACTIONS],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actionGroups parameter is currently defined as a string[], so I'm surprised this type checks, since FIRED_ACTIONS is an object and not a string. Perhaps I'm missing something here tho.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This failed to typecheck before because, according to the docs, actionGroups is typed as Array<{id:string, name:string}>

scheduleActions is still expecting a string though which is why this one's failing:
https://github.com/elastic/kibana/pull/57606/checks?check_run_id=446684113

Need to fix that. I guess scheduleActions is expecting FIRED_ACTIONS.id? Might want to clear up this ambiguity

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops, yeah, sorry! We did change the types on actionGroups last week or so, I forgot about that. scheduleActions() should just take the id value, and not the {id, name} structure. The {id, name} structure lets us associate an i18n "name" for a particular id, but that name is not used during the execution of alerts, only the id.

I've opened up issue #57853 to clean up the README. Please feel free to add more to it!

});

const { buckets } = result.aggregations.aggregatedIntervals;
const { value } = buckets[buckets.length - 1].aggregatedValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're only using the last bucket, the date histogram doesn't seem to be that useful here. But this function could be repurposed to deal with a "history" of what the alert would have done with old data, eg, for usage in a preview UI. We're doing something similar with the not-yet-merged built-in alerting index threshold alertType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still an ES query newbie, what approach would you recommend for getting this data without using histogram buckets?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can use a simple "metrics" aggregation, which doesn't do any bucketing, eg https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-avg-aggregation.html . You've already got the set of documents to use to generate the metric value you want via the filter, and you don't need those documents further bucketed. That's what I'd try, but I'd also have someone else review it, as it wouldn't surprise me if there was something even easier - because I am also an ES query n00b :-). I actually kinda wonder if an esSQL query would be the easiest thing here (and thinking about it a little for the index threshold as well).

});
}

alertInstance.replaceState({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't seem to be using this alertInstance state being stored at this point, anywhere else (at least in the alert), so you technically don't need this. Perhaps it's for a future use ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured that we'd need this for a UI that displays alerts and their current states/historical states. I know that's not implemented yet but I assume it'll use the api/alert/<id>/state API? If it's still up in the air I can remove this and we'll handle that later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alertInstance state is typed as a Record<string, unknown>, and as such, the alerting plugin itself is unlikely to expose any generic UI that could display it. A consumer-specific plugin certainly could. I think it's fine to leave this in, perhaps add a comment that this could be retrieved by a metrics plugin to provide alert-instance specific data in a UI. ie, "future use"

@Zacqary
Copy link
Contributor Author

Zacqary commented Feb 14, 2020

@pmuellr From a quick glance at the index threshold type it looks like we can definitely extend that, or at least import common stuff such as comparators. I'll give it a more thorough review, thanks!

@simianhacker
Copy link
Member

For the checks, instead of creating a query (with aggregations) and sending it directly to Elasticsearch, I would prefer we try and use the same backend API's for the UI. That way the number we display is exactly the same number we check against. That might require us to refactor our API's since alerts comes with it's own "search context" and that will need to be passed in some how.

With that in mind... How does the Alerts system work with Spaces? This is important because if a user creates an alert in a custom space where they've modified the index pattern in our settings, we need to know to honor that pattern.

…t-type

# Conflicts:
#	x-pack/legacy/plugins/infra/index.ts
#	x-pack/legacy/plugins/infra/server/lib/metrics/get_groupings.ts
#	x-pack/legacy/plugins/infra/server/routes/metrics_explorer/lib/get_groupings.ts
#	x-pack/plugins/infra/server/routes/metrics_explorer/lib/get_groupings.ts
@Zacqary
Copy link
Contributor Author

Zacqary commented Feb 19, 2020

Update from Zoom talk with @simianhacker and @phillipb: We want to explore using getTSVBData for this alert function so that we know it's alerting on the same data that the Metrics Explorer receives. This will require hacking a requestContext to send from the executor function (which doesn't use requests), as the underlying get_vis_data is still relying on some legacy platform stuff. We'll write an integration test to notify us if the underlying function changes and breaks our hack.

If this ends up being too fragile or complicated we can use the Snapshot API instead, or just stick with our own ES query if that's also too difficult.

@Zacqary
Copy link
Contributor Author

Zacqary commented Feb 19, 2020

Another update: We're going to test and make sure this PR works as-is and open up a second issue to switch the engine for getting the current metric value.

@Zacqary
Copy link
Contributor Author

Zacqary commented Feb 20, 2020

@elasticmachine merge upstream

@Zacqary
Copy link
Contributor Author

Zacqary commented Feb 26, 2020

@elasticmachine merge upstream

Copy link
Contributor

@phillipb phillipb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Member

@simianhacker simianhacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (code wise)

Are you planning on change the aggregation in a follow up PR to support rate?

@Zacqary
Copy link
Contributor Author

Zacqary commented Feb 27, 2020

@simianhacker Tracking rate in #58621

@Zacqary Zacqary merged commit 390165b into elastic:master Feb 27, 2020
Zacqary added a commit to Zacqary/kibana that referenced this pull request Feb 27, 2020
…stic#57606)

* [Infra] Add basic backend for metric threshold alerts

* Define separate fired/recovered action groups

* Allow alerting on arbitrary search fields besides host.name

* Add list and delete endpoioints

* Add groupBy alerts

* Remove extraneous  routes and SavedObject logic

* Remove additional SavedObject code

* Remove renotify logic from executor

* Fix action group type

* Fix scheduledActions typecheck

* Fix i18n

* Migrate alerting to new platform

* Add alerting to infra dependencies

* Add comment about future use

* Adjust alert params tm names to sync with UI; default to Entire Infrastructure alert

* Add support for between comparator

Co-authored-by: Elastic Machine <[email protected]>
@Zacqary Zacqary deleted the 56427-metric-alert-type branch February 27, 2020 17:57
@kibanamachine
Copy link
Contributor

Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync.

@kibanamachine kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Feb 28, 2020
Zacqary added a commit that referenced this pull request Feb 28, 2020
) (#58742)

* [Infra] Add basic backend for metric threshold alerts

* Define separate fired/recovered action groups

* Allow alerting on arbitrary search fields besides host.name

* Add list and delete endpoioints

* Add groupBy alerts

* Remove extraneous  routes and SavedObject logic

* Remove additional SavedObject code

* Remove renotify logic from executor

* Fix action group type

* Fix scheduledActions typecheck

* Fix i18n

* Migrate alerting to new platform

* Add alerting to infra dependencies

* Add comment about future use

* Adjust alert params tm names to sync with UI; default to Entire Infrastructure alert

* Add support for between comparator

Co-authored-by: Elastic Machine <[email protected]>

Co-authored-by: Elastic Machine <[email protected]>
@kibanamachine kibanamachine removed the backport missing Added to PRs automatically when the are determined to be missing a backport. label Feb 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting Feature:Metrics UI Metrics UI feature release_note:enhancement Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.7.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Metrics Alerts] Create Metric Threshold Alert Type and Executor
6 participants