Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix invalid silence causes incomplete updates #3898

Conversation

grobinson-grafana
Copy link
Contributor

@grobinson-grafana grobinson-grafana commented Jun 24, 2024

This commit fixes a bug where an invalid silence causes incomplete updates of existing silences. What happens here is in some cases where the new silence is invalid, it can expire the old silence without creating the new silence. What should happen instead is the entire operation is aborted, and the original silence is left in place. This has been fixed by moving validation out of the setSilence method and putting it at the start of the Set method instead.

@@ -518,9 +518,6 @@ func matchesEmpty(m *pb.Matcher) bool {
}

func validateSilence(s *pb.Silence) error {
if s.Id == "" {
Copy link
Contributor Author

@grobinson-grafana grobinson-grafana Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed checks for the Id and UpdatedAt fields from validateSilence. There are two reasons for this:

  1. I have moved validation of silences to the start of the function before the Id and UpdatedAt fields are assigned. This is to fix the issue where an existing silence is expired but the new silence is not created.
  2. These fields are supposed to be set in silences.Set(*pb.Silence). These are not supposed to be user-editable fields. We assert that these are present within tests, but should not need to validate internal fields that the user cannot change.

@@ -611,13 +599,21 @@ func (s *Silences) Set(sil *pb.Silence) error {
defer s.mtx.Unlock()

now := s.nowUTC()
if sil.StartsAt.IsZero() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -544,9 +541,6 @@ func validateSilence(s *pb.Silence) error {
if s.EndsAt.Before(s.StartsAt) {
return errors.New("end time must not be before start time")
}
if s.UpdatedAt.IsZero() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have unit tests that check UpdatedAt is not zero.

@grobinson-grafana grobinson-grafana force-pushed the grobinson/fix-invalid-silence-replacing-silences branch from 0f0637d to da7f867 Compare June 24, 2024 16:21
@@ -468,6 +468,19 @@ func TestSilenceSet(t *testing.T) {
},
}
require.Equal(t, want, s.st, "unexpected state after silence creation")

// Updating an existing silence with an invalid silence should not expire
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails in main because the original silence is expired.

Copy link
Member

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@grobinson-grafana
Copy link
Contributor Author

Will rebase #3899 to show that no behavior has changed.

@grobinson-grafana grobinson-grafana force-pushed the grobinson/fix-invalid-silence-replacing-silences branch from da7f867 to 3ebf9a3 Compare June 25, 2024 10:59
@grobinson-grafana
Copy link
Contributor Author

I just rebased main to include #3899.

@grobinson-grafana grobinson-grafana force-pushed the grobinson/fix-invalid-silence-replacing-silences branch from 3ebf9a3 to adfa7e2 Compare June 25, 2024 11:05
This commit fixes a bug where an invalid silence causes incomplete
updates of existing silences. This is fixed moving validation
out of the setSilence method and putting it at the start of the
Set method instead.

Signed-off-by: George Robinson <[email protected]>
@grobinson-grafana grobinson-grafana force-pushed the grobinson/fix-invalid-silence-replacing-silences branch from adfa7e2 to dbac936 Compare June 25, 2024 11:10
@gotjosh gotjosh merged commit 58dc6f8 into prometheus:main Jun 25, 2024
11 checks passed
grobinson-grafana added a commit to grobinson-grafana/alertmanager that referenced this pull request Jun 25, 2024
This commit fixes a bug where an invalid silence causes incomplete
updates of existing silences. This is fixed moving validation
out of the setSilence method and putting it at the start of the
Set method instead.

Signed-off-by: George Robinson <[email protected]>
@grobinson-grafana grobinson-grafana deleted the grobinson/fix-invalid-silence-replacing-silences branch June 25, 2024 15:59
grobinson-grafana added a commit to grafana/mimir that referenced this pull request Jun 26, 2024
This commit fixes the following bugs in silences:

- prometheus/alertmanager#3877
- prometheus/alertmanager#3898
- prometheus/alertmanager#3897

which could cause an existing silence to be deleted/expired
when updating the silence failed. This could be because
the replacing silence exceeded limits or was invalid.
grobinson-grafana added a commit to grafana/mimir that referenced this pull request Jun 26, 2024
This commit fixes the following bugs in silences:

- prometheus/alertmanager#3877
- prometheus/alertmanager#3898
- prometheus/alertmanager#3897

which could cause an existing silence to be deleted/expired
when updating the silence failed. This could be because
the replacing silence exceeded limits or was invalid.
grobinson-grafana added a commit to grafana/mimir that referenced this pull request Jun 26, 2024
This commit fixes the following bugs in silences:

- prometheus/alertmanager#3877
- prometheus/alertmanager#3898
- prometheus/alertmanager#3897

which could cause an existing silence to be deleted/expired
when updating the silence failed. This could be because
the replacing silence exceeded limits or was invalid.

additional tests in upstream.
grobinson-grafana added a commit to grafana/mimir that referenced this pull request Jun 26, 2024
This commit fixes the following bugs in silences:

- prometheus/alertmanager#3877
- prometheus/alertmanager#3898
- prometheus/alertmanager#3897

which could cause an existing silence to be deleted/expired
when updating the silence failed. This could be because
the replacing silence exceeded limits or was invalid.
dimitarvdimitrov pushed a commit to grafana/mimir that referenced this pull request Jul 2, 2024
This commit fixes the following bugs in silences:

- prometheus/alertmanager#3877
- prometheus/alertmanager#3898
- prometheus/alertmanager#3897

which could cause an existing silence to be deleted/expired
when updating the silence failed. This could be because
the replacing silence exceeded limits or was invalid.

(cherry picked from commit 1cfb657)
dimitarvdimitrov added a commit to grafana/mimir that referenced this pull request Jul 2, 2024
* Fixes a number of bugs in silences (#8525)

This commit fixes the following bugs in silences:

- prometheus/alertmanager#3877
- prometheus/alertmanager#3898
- prometheus/alertmanager#3897

which could cause an existing silence to be deleted/expired
when updating the silence failed. This could be because
the replacing silence exceeded limits or was invalid.

(cherry picked from commit 1cfb657)

* Update CHANGELOG.md (#8526)

(cherry picked from commit 36f7af3)

---------

Co-authored-by: George Robinson <[email protected]>
TheMeier pushed a commit to TheMeier/alertmanager that referenced this pull request Sep 29, 2024
This commit fixes a bug where an invalid silence causes incomplete
updates of existing silences. This is fixed moving validation
out of the setSilence method and putting it at the start of the
Set method instead.

Signed-off-by: George Robinson <[email protected]>
SuperQ added a commit that referenced this pull request Oct 16, 2024
* [CHANGE] Deprecate and remove api/v1/ #2970
* [CHANGE] Remove unused feature flags #3676
* [CHANGE] Newlines in smtp password file are now ignored #3681
* [CHANGE] Change compat metrics to counters #3686
* [CHANGE] Do not register compat metrics in amtool #3713
* [CHANGE] Remove metrics from compat package #3714
* [CHANGE] Mark muted alerts #3793
* [FEATURE] Add metric for inhibit rules #3681
* [FEATURE] Support UTF-8 label matchers #3453, #3507, #3523, #3483, #3567, #3568, #3569, #3571, #3595, #3604, #3619, #3658, #3659, #3662, #3668, 3572
* [FEATURE] Add counter to track alerts dropped outside of time_intervals #3565
* [FEATURE] Add date and tz functions to templates #3812
* [FEATURE] Add limits for silences #3852
* [FEATURE] Add time helpers for templates #3863
* [FEATURE] Add auto GOMAXPROCS #3837
* [FEATURE] Add auto GOMEMLIMIT #3895
* [FEATURE] Add Jira receiver integration #3590
* [ENHANCEMENT] Add the receiver name to notification metrics #3045
* [ENHANCEMENT] Add the route ID to uuid #3372
* [ENHANCEMENT] Add duration to the notify success message #3559
* [ENHANCEMENT] Implement webhook_url_file for discord and msteams #3555
* [ENHANCEMENT] Add debug logs for muted alerts #3558
* [ENHANCEMENT] API: Allow the Silences API to use their own 400 response #3610
* [ENHANCEMENT] Add summary to msteams notification #3616
* [ENHANCEMENT] Add context reasons to notifications failed counter #3631
* [ENHANCEMENT] Add optional native histogram support to latency metrics #3737
* [ENHANCEMENT] Enable setting ThreadId for Telegram notifications #3638
* [ENHANCEMENT] Allow webex roomID from template #3801
* [BUGFIX] Add missing integrations to notify metrics #3480
* [BUGFIX] Add missing ttl in pushhover #3474
* [BUGFIX] Fix scheme required for webhook url in amtool #3409
* [BUGFIX] Remove duplicate integration from metrics #3516
* [BUGFIX] Reflect Discord's max length message limits #3597
* [BUGFIX] Fix nil error in warn logs about incompatible matchers #3683
* [BUGFIX] Fix a small number of inconsistencies in compat package logging #3718
* [BUGFIX] Fix log line in featurecontrol #3719
* [BUGFIX] Fix panic in acceptance tests #3592
* [BUGFIX] Fix flaky test TestClusterJoinAndReconnect/TestTLSConnection #3722
* [BUGFIX] Fix crash on errors when url_file is used #3800
* [BUGFIX] Fix race condition in dispatch.go #3826
* [BUGFIX] Fix race conditions in the memory alerts store #3648
* [BUGFIX] Hide config.SecretURL when the URL is incorrect. #3887
* [BUGFIX] Fix invalid silence causes incomplete updates #3898
* [BUGFIX] Fix leaking of Silences matcherCache entries #3930
* [BUGFIX] Close SMTP submission correctly to handle errors #4006

Signed-off-by: SuperQ <[email protected]>
@SuperQ SuperQ mentioned this pull request Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants