-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metrics Alerts] Create Metric Threshold Alert Type and Executor #57606
Conversation
Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui) |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a quick review after @mikecote pointed this PR to me.
Looks good so far, made some comments, obviously hit us up for any questions.
Two relevant PRs ...
- Alerting plugin migrate to Kibana platform - PR Alerting plugin migrate to Kibana platform #57635
- [alerting] initial index threshold alertType and supporting APIs - PR [alerting] initial index threshold alertType and supporting APIs #57030
The first is the alerting plugin migration from legacy to new platform. Your plugin setup will change a bit. This will likely get merged to master by next week (if not sooner).
The second is the initial step in our first alertType that we will provide built-in with the alerting plugin. It's fairly similar to this one. I've been expecting that other consumers of the alerting plugin will have similar kinds of alertTypes, perhaps over time we can share some implementation code. For now, you might want to take a peek to get some additional ideas, like providing TONS of context properties when scheduling actions. :-)
indexPattern: schema.string(), | ||
}), | ||
}, | ||
actionGroups: [FIRED_ACTIONS], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actionGroups
parameter is currently defined as a string[]
, so I'm surprised this type checks, since FIRED_ACTIONS
is an object and not a string. Perhaps I'm missing something here tho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This failed to typecheck before because, according to the docs, actionGroups
is typed as Array<{id:string, name:string}>
scheduleActions
is still expecting a string
though which is why this one's failing:
https://github.com/elastic/kibana/pull/57606/checks?check_run_id=446684113
Need to fix that. I guess scheduleActions
is expecting FIRED_ACTIONS.id
? Might want to clear up this ambiguity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woops, yeah, sorry! We did change the types on actionGroups
last week or so, I forgot about that. scheduleActions()
should just take the id
value, and not the {id, name}
structure. The {id, name}
structure lets us associate an i18n "name" for a particular id, but that name is not used during the execution of alerts, only the id
.
I've opened up issue #57853 to clean up the README. Please feel free to add more to it!
}); | ||
|
||
const { buckets } = result.aggregations.aggregatedIntervals; | ||
const { value } = buckets[buckets.length - 1].aggregatedValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're only using the last bucket, the date histogram doesn't seem to be that useful here. But this function could be repurposed to deal with a "history" of what the alert would have done with old data, eg, for usage in a preview UI. We're doing something similar with the not-yet-merged built-in alerting index threshold alertType.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still an ES query newbie, what approach would you recommend for getting this data without using histogram buckets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you can use a simple "metrics" aggregation, which doesn't do any bucketing, eg https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-avg-aggregation.html . You've already got the set of documents to use to generate the metric value you want via the filter, and you don't need those documents further bucketed. That's what I'd try, but I'd also have someone else review it, as it wouldn't surprise me if there was something even easier - because I am also an ES query n00b :-). I actually kinda wonder if an esSQL query would be the easiest thing here (and thinking about it a little for the index threshold as well).
}); | ||
} | ||
|
||
alertInstance.replaceState({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't seem to be using this alertInstance state being stored at this point, anywhere else (at least in the alert), so you technically don't need this. Perhaps it's for a future use ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured that we'd need this for a UI that displays alerts and their current states/historical states. I know that's not implemented yet but I assume it'll use the api/alert/<id>/state
API? If it's still up in the air I can remove this and we'll handle that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alertInstance state is typed as a Record<string, unknown>
, and as such, the alerting plugin itself is unlikely to expose any generic UI that could display it. A consumer-specific plugin certainly could. I think it's fine to leave this in, perhaps add a comment that this could be retrieved by a metrics plugin to provide alert-instance specific data in a UI. ie, "future use"
@pmuellr From a quick glance at the index threshold type it looks like we can definitely extend that, or at least import common stuff such as comparators. I'll give it a more thorough review, thanks! |
For the checks, instead of creating a query (with aggregations) and sending it directly to Elasticsearch, I would prefer we try and use the same backend API's for the UI. That way the number we display is exactly the same number we check against. That might require us to refactor our API's since alerts comes with it's own "search context" and that will need to be passed in some how. With that in mind... How does the Alerts system work with Spaces? This is important because if a user creates an alert in a custom space where they've modified the index pattern in our settings, we need to know to honor that pattern. |
…t-type # Conflicts: # x-pack/legacy/plugins/infra/index.ts # x-pack/legacy/plugins/infra/server/lib/metrics/get_groupings.ts # x-pack/legacy/plugins/infra/server/routes/metrics_explorer/lib/get_groupings.ts # x-pack/plugins/infra/server/routes/metrics_explorer/lib/get_groupings.ts
Update from Zoom talk with @simianhacker and @phillipb: We want to explore using If this ends up being too fragile or complicated we can use the Snapshot API instead, or just stick with our own ES query if that's also too difficult. |
Another update: We're going to test and make sure this PR works as-is and open up a second issue to switch the engine for getting the current metric value. |
@elasticmachine merge upstream |
…nto 56427-metric-alert-type
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (code wise)
Are you planning on change the aggregation in a follow up PR to support rate?
@simianhacker Tracking rate in #58621 |
…stic#57606) * [Infra] Add basic backend for metric threshold alerts * Define separate fired/recovered action groups * Allow alerting on arbitrary search fields besides host.name * Add list and delete endpoioints * Add groupBy alerts * Remove extraneous routes and SavedObject logic * Remove additional SavedObject code * Remove renotify logic from executor * Fix action group type * Fix scheduledActions typecheck * Fix i18n * Migrate alerting to new platform * Add alerting to infra dependencies * Add comment about future use * Adjust alert params tm names to sync with UI; default to Entire Infrastructure alert * Add support for between comparator Co-authored-by: Elastic Machine <[email protected]>
Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync. |
) (#58742) * [Infra] Add basic backend for metric threshold alerts * Define separate fired/recovered action groups * Allow alerting on arbitrary search fields besides host.name * Add list and delete endpoioints * Add groupBy alerts * Remove extraneous routes and SavedObject logic * Remove additional SavedObject code * Remove renotify logic from executor * Fix action group type * Fix scheduledActions typecheck * Fix i18n * Migrate alerting to new platform * Add alerting to infra dependencies * Add comment about future use * Adjust alert params tm names to sync with UI; default to Entire Infrastructure alert * Add support for between comparator Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: Elastic Machine <[email protected]>
Summary
Closes #56427
This is a merge and update of some of the components of #47165
Adds an alert type and executor function for metric threshold alerts. Only supports the Fired action group (not Recovered), as multiple action groups won't be available until 7.8.
The new Alerting APIs support renotify intervals, retrieving current alert states, and other features that were previously missing in the POC, so those are now removed. The executor function is now a relatively simple check of the current metric value.
Checklist
Delete any items that are not applicable to this PR.
[ ] Documentation was added for features that require explanation or tutorials[ ] Unit or functional tests were updated or added to match the most common scenarios[ ] This was checked for keyboard-only and screenreader accessibility[ ] This renders correctly on smaller devices using a responsive layout. (You can test this in your browser[ ] This was checked for cross-browser compatibility, including a check against IE11For maintainers