Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log the GroupKey and alerts in retry #3438

Conversation

grobinson-grafana
Copy link
Contributor

@grobinson-grafana grobinson-grafana commented Aug 3, 2023

What this PR does

This pull request updates notify.go to log the GroupKey and fingerprints of an alert at the debug level, and just the GroupKey at the warning level should the notify attempt fail.

Here is an example of it at debug level:

ts=2023-08-03T09:28:36.829Z caller=dispatch.go:163 level=debug component=dispatcher msg="Received alert" alert=foo[c5a7cb1][active]
ts=2023-08-03T09:28:38.725Z caller=dispatch.go:163 level=debug component=dispatcher msg="Received alert" alert=bar[4656093][active]
ts=2023-08-03T09:28:51.830Z caller=dispatch.go:515 level=debug component=dispatcher aggrGroup="{}:{alertname=\"foo\"}" msg=flushing alerts=[foo[c5a7cb1][active]]
ts=2023-08-03T09:28:51.984Z caller=notify.go:757 level=debug component=dispatcher receiver=email integration=email[0] aggrGroup="{}:{alertname=\"foo\"}" alerts=[foo[c5a7cb1][active]] msg="Notify success" attempts=1
ts=2023-08-03T09:28:53.726Z caller=dispatch.go:515 level=debug component=dispatcher aggrGroup="{}:{alertname=\"bar\"}" msg=flushing alerts=[bar[4656093][active]]
ts=2023-08-03T09:28:53.855Z caller=notify.go:757 level=debug component=dispatcher receiver=email integration=email[0] aggrGroup="{}:{alertname=\"bar\"}" alerts=[bar[4656093][active]] msg="Notify success" attempts=1

and when notify attempt fails:

ts=2023-08-03T09:34:42.505Z caller=notify.go:745 level=warn component=dispatcher receiver=email integration=email[0] aggrGroup="{}:{alertname=\"foo\"}" msg="Notify attempt failed, will retry later" attempts=1 err="establish connection to server: dial tcp 127.0.0.1:1025: connect: connection refused"

Motivation

Without this change it's almost impossible to correlate a Notify success log line with the aggregation group's flush log line, and vice versa. This change will make it easier to debug issues around missing notifications or unexpected notifications for alerts when running large Alertmanager installations with 10,000s of alerts.

For example:

ts=2023-08-03T09:45:57.934Z caller=dispatch.go:515 level=debug component=dispatcher aggrGroup="{}:{alertname=\"foo\"}" msg=flushing alerts=[foo[c5a7cb1][active]]
ts=2023-08-03T09:45:58.139Z caller=notify.go:752 level=debug component=dispatcher receiver=email integration=email[0] msg="Notify success" attempts=1
ts=2023-08-03T09:45:59.801Z caller=dispatch.go:515 level=debug component=dispatcher aggrGroup="{}:{alertname=\"bar\"}" msg=flushing alerts=[bar[4656093][active]]
ts=2023-08-03T09:45:59.935Z caller=notify.go:752 level=debug component=dispatcher receiver=email integration=email[0] msg="Notify success" attempts=1

@grobinson-grafana grobinson-grafana force-pushed the grobinson/log-group-key-and-alerts-in-retry branch from 1be3f6b to d6a2b0d Compare August 3, 2023 09:46
Copy link
Member

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

This commit updates notify.go to log the GroupKey and fingerprints
of an alert at the debug level, and just the GroupKey at the
warning level should the notify attempt fail.

Signed-off-by: George Robinson <[email protected]>
@gotjosh gotjosh force-pushed the grobinson/log-group-key-and-alerts-in-retry branch from d6a2b0d to 64cd8f3 Compare August 7, 2023 09:53
@gotjosh
Copy link
Member

gotjosh commented Aug 7, 2023

Rebased to make sure CI passes.

Copy link
Member

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gotjosh gotjosh merged commit 638f41c into prometheus:main Aug 7, 2023
1 check passed
@gotjosh
Copy link
Member

gotjosh commented Aug 7, 2023

Thank you very much for your contribution.

radek-ryckowski pushed a commit to goldmansachs/alertmanager that referenced this pull request Nov 6, 2023
This commit updates notify.go to log the GroupKey and fingerprints
of an alert at the debug level, and just the GroupKey at the
warning level should the notify attempt fail.

Signed-off-by: George Robinson <[email protected]>
qinxx108 pushed a commit to amazon-contributing/alertmanager that referenced this pull request Mar 28, 2024
This commit updates notify.go to log the GroupKey and fingerprints
of an alert at the debug level, and just the GroupKey at the
warning level should the notify attempt fail.

Signed-off-by: George Robinson <[email protected]>
@grobinson-grafana grobinson-grafana deleted the grobinson/log-group-key-and-alerts-in-retry branch April 16, 2024 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants