-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the receiver name to notification metrics #3045
Add the receiver name to notification metrics #3045
Conversation
Could we have this behind a cli flag? Cli flag because I think we should issue it consistently over the lifetime of the app. |
Sorry for the hiatus. Will push a CLI flag today |
1ed8b42
to
35d0483
Compare
3c6c530
to
769396f
Compare
Fixed this up and it's ready for another review |
@roidelapluie could you pls take another look? |
This commit adds in a second label to the notify family of metrics (e.g. numTotalFailedNotifications) - the receiver name. This allows disambiguating which receiver is failing when one has many receivers with the same integration type Signed-off-by: sinkingpoint <[email protected]>
Alertmanager doesn't actually have a feature flag system yet, so this invents the universe, mostly by cribbing off the Prometheus model. Here we introduce a new CLI flag `--enable-feature` that we parse into a new `featureConfigs` struct. We also update the PipelineBuilder to take that flag and pass it through the metrics infrastructure. This also required a bit of deduplication to make finding the correct labelset (with or without the receiver name) easier in every place that it's used, and updating the tests to use that same mechanism. Signed-off-by: sinkingpoint <[email protected]>
…ation code Signed-off-by: gotjosh <[email protected]>
769396f
to
acb5eaa
Compare
Signed-off-by: gotjosh <[email protected]>
Thank you very much for your contribution - you've been waiting for a long time on this so instead of dragging it back and forth, and went ahead and implemented a couple of suggestions I wanted on this PR. The bulk of the change is around having the "feature flag management" on its package and applying a couple of patterns to keep them tidy for future additions. The other big change is the fact that we can initialize ahead of time the metrics for the given receiver names. I'd love if @simonpasquier can take a look as I've written a big chunk of this code. |
Signed-off-by: gotjosh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main remark is that old label values would be retained after a config reload in case some receivers are deleted/renamed. I guess that we need DeleteLabelValues() to prune them.
Signed-off-by: gotjosh <[email protected]>
I can't use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thank you very much for your contribution @sinkingpoint. |
@gotjosh Do you have an estimate for when the next official Alertmanager container image will be available that includes this change? |
@gotjosh Is there any update on when the new Alertmanager image will be available? |
Have you tested out the current rc.0 https://github.com/prometheus/alertmanager/releases/tag/v0.27.0-rc.0? I'm planning on promoting the the RC to 27.0 by the end of the week if nothing major is reported. |
* [CHANGE] Deprecate and remove api/v1/ #2970 * [CHANGE] Remove unused feature flags #3676 * [CHANGE] Newlines in smtp password file are now ignored #3681 * [CHANGE] Change compat metrics to counters #3686 * [CHANGE] Do not register compat metrics in amtool #3713 * [CHANGE] Remove metrics from compat package #3714 * [CHANGE] Mark muted alerts #3793 * [FEATURE] Add metric for inhibit rules #3681 * [FEATURE] Support UTF-8 label matchers #3453, #3507, #3523, #3483, #3567, #3568, #3569, #3571, #3595, #3604, #3619, #3658, #3659, #3662, #3668, 3572 * [FEATURE] Add counter to track alerts dropped outside of time_intervals #3565 * [FEATURE] Add date and tz functions to templates #3812 * [FEATURE] Add limits for silences #3852 * [FEATURE] Add time helpers for templates #3863 * [FEATURE] Add auto GOMAXPROCS #3837 * [FEATURE] Add auto GOMEMLIMIT #3895 * [FEATURE] Add Jira receiver integration #3590 * [ENHANCEMENT] Add the receiver name to notification metrics #3045 * [ENHANCEMENT] Add the route ID to uuid #3372 * [ENHANCEMENT] Add duration to the notify success message #3559 * [ENHANCEMENT] Implement webhook_url_file for discord and msteams #3555 * [ENHANCEMENT] Add debug logs for muted alerts #3558 * [ENHANCEMENT] API: Allow the Silences API to use their own 400 response #3610 * [ENHANCEMENT] Add summary to msteams notification #3616 * [ENHANCEMENT] Add context reasons to notifications failed counter #3631 * [ENHANCEMENT] Add optional native histogram support to latency metrics #3737 * [ENHANCEMENT] Enable setting ThreadId for Telegram notifications #3638 * [ENHANCEMENT] Allow webex roomID from template #3801 * [BUGFIX] Add missing integrations to notify metrics #3480 * [BUGFIX] Add missing ttl in pushhover #3474 * [BUGFIX] Fix scheme required for webhook url in amtool #3409 * [BUGFIX] Remove duplicate integration from metrics #3516 * [BUGFIX] Reflect Discord's max length message limits #3597 * [BUGFIX] Fix nil error in warn logs about incompatible matchers #3683 * [BUGFIX] Fix a small number of inconsistencies in compat package logging #3718 * [BUGFIX] Fix log line in featurecontrol #3719 * [BUGFIX] Fix panic in acceptance tests #3592 * [BUGFIX] Fix flaky test TestClusterJoinAndReconnect/TestTLSConnection #3722 * [BUGFIX] Fix crash on errors when url_file is used #3800 * [BUGFIX] Fix race condition in dispatch.go #3826 * [BUGFIX] Fix race conditions in the memory alerts store #3648 * [BUGFIX] Hide config.SecretURL when the URL is incorrect. #3887 * [BUGFIX] Fix invalid silence causes incomplete updates #3898 * [BUGFIX] Fix leaking of Silences matcherCache entries #3930 * [BUGFIX] Close SMTP submission correctly to handle errors #4006 Signed-off-by: SuperQ <[email protected]>
Solves #3012
This PR adds in a second label to all the Notification metrics, so that we not only have the type of notification, but all the receiver_name that the notification belongs to.
Because of the proliferation of webhook notifiers (as recommended by the docs), this is necessary to be able to properly diagnose failures from metrics - currently we get one metric for all web hooks, regardless of the fact that they might represent completely different notification pathways.
This is technically a breaking change, so this might have to wait for a bigger release, if its even worthwhile at all