-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MaxSilences limit causes incomplete updates of existing silences #3877
Fix MaxSilences limit causes incomplete updates of existing silences #3877
Conversation
de674ea
to
41e148f
Compare
72bbc68
to
4e027b7
Compare
This commit fixes a bug where the MaxSilences limit can cause an incomplete update of existing silences, where the old silence can be expired but the new silence cannot be created because it would exceeded the maximum number of silences. Signed-off-by: George Robinson <[email protected]>
4e027b7
to
8de387e
Compare
@@ -564,6 +564,13 @@ func (s *Silences) getSilence(id string) (*pb.Silence, bool) { | |||
return msil.Silence, true | |||
} | |||
|
|||
func (s *Silences) toMeshSilence(sil *pb.Silence) *pb.MeshSilence { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved this to a function for two reasons:
-
It allows us to check the full error in unit tests. For example:
require.EqualError(t, s.Set(sil3), fmt.Sprintf("silence exceeded maximum size: %d bytes (limit: 4096 bytes)", s.toMeshSilence(sil3).Size()))
-
We will use it in the next PR where we fix the same bug but for maximum size limits and validation errors.
return fmt.Errorf("expire previous silence: %w", err) | ||
} | ||
} | ||
if ok && canUpdate(prev, sil, now) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code should run before checking the MaxSilences()
limit because we don't want to prevent updating an existing silence if we are at the limit and the update does not require expiring and creating a new silence.
@@ -631,6 +627,13 @@ func (s *Silences) Set(sil *pb.Silence) error { | |||
} | |||
} | |||
|
|||
if ok && getState(prev, s.nowUTC()) != types.SilenceStateExpired { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire change here is moving the expiration of the existing silence to run after checking the MaxSilences()
limit. This ensures that an existing silence is not expired if we are at the limit, as what would happen is the existing silence is expired and then the new silence cannot be created.
StartsAt: time.Now(), | ||
EndsAt: time.Now().Add(5 * time.Minute), | ||
} | ||
require.EqualError(t, s.Set(sil2), "exceeded maximum number of silences: 1 (limit: 1)") | ||
require.Equal(t, "", sil2.Id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed all checks for require.Equal(t, "", id)
in this test because it's unrelated to limits.
StartsAt: time.Now(), | ||
EndsAt: time.Now().Add(5 * time.Minute), | ||
} | ||
err = s.Set(sil3) | ||
require.Error(t, err) | ||
// Do not check the exact size as it can change between consecutive runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this.
require.NotEqual(t, "", sil3.Id) | ||
require.EqualError(t, s.Set(sil3), fmt.Sprintf("silence exceeded maximum size: %d bytes (limit: 4096 bytes)", s.toMeshSilence(sil3).Size())) | ||
|
||
// Should be able to insert sil4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added additional tests here to cover the following cases:
- Updating an existing silence without modifications.
- Adding a comment to an existing silence.
- Failing to update an existing silence because the maximum number of silences has been exceeded should not expire the existing silence.
Signed-off-by: George Robinson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: George Robinson <[email protected]>
…rometheus#3877) * Fix MaxSilences limit causes incomplete updates of existing silences This commit fixes a bug where the MaxSilences limit can cause an incomplete update of existing silences, where the old silence can be expired but the new silence cannot be created because it would exceeded the maximum number of silences. Signed-off-by: George Robinson <[email protected]> --------- Signed-off-by: George Robinson <[email protected]>
This commit fixes the following bugs in silences: - prometheus/alertmanager#3877 - prometheus/alertmanager#3898 - prometheus/alertmanager#3897 which could cause an existing silence to be deleted/expired when updating the silence failed. This could be because the replacing silence exceeded limits or was invalid.
This commit fixes the following bugs in silences: - prometheus/alertmanager#3877 - prometheus/alertmanager#3898 - prometheus/alertmanager#3897 which could cause an existing silence to be deleted/expired when updating the silence failed. This could be because the replacing silence exceeded limits or was invalid.
This commit fixes the following bugs in silences: - prometheus/alertmanager#3877 - prometheus/alertmanager#3898 - prometheus/alertmanager#3897 which could cause an existing silence to be deleted/expired when updating the silence failed. This could be because the replacing silence exceeded limits or was invalid. additional tests in upstream.
This commit fixes the following bugs in silences: - prometheus/alertmanager#3877 - prometheus/alertmanager#3898 - prometheus/alertmanager#3897 which could cause an existing silence to be deleted/expired when updating the silence failed. This could be because the replacing silence exceeded limits or was invalid.
This commit fixes the following bugs in silences: - prometheus/alertmanager#3877 - prometheus/alertmanager#3898 - prometheus/alertmanager#3897 which could cause an existing silence to be deleted/expired when updating the silence failed. This could be because the replacing silence exceeded limits or was invalid. (cherry picked from commit 1cfb657)
* Fixes a number of bugs in silences (#8525) This commit fixes the following bugs in silences: - prometheus/alertmanager#3877 - prometheus/alertmanager#3898 - prometheus/alertmanager#3897 which could cause an existing silence to be deleted/expired when updating the silence failed. This could be because the replacing silence exceeded limits or was invalid. (cherry picked from commit 1cfb657) * Update CHANGELOG.md (#8526) (cherry picked from commit 36f7af3) --------- Co-authored-by: George Robinson <[email protected]>
…rometheus#3877) * Fix MaxSilences limit causes incomplete updates of existing silences This commit fixes a bug where the MaxSilences limit can cause an incomplete update of existing silences, where the old silence can be expired but the new silence cannot be created because it would exceeded the maximum number of silences. Signed-off-by: George Robinson <[email protected]> --------- Signed-off-by: George Robinson <[email protected]>
* Release v0.28.0-rc.0 * [CHANGE] Templating errors in the SNS integration now return an error. #3531 #3879 * [FEATURE] Add a new Microsoft Teams integration based on Flows #4024 * [FEATURE] Add a new Rocket.Chat integration #3600 * [FEATURE] Add a new Jira integration #3590 #3931 * [FEATURE] Add support for `GOMEMLIMIT`, enable it via the feature flag `--enable-feature=auto-gomemlimit`. #3895 * [FEATURE] Add support for `GOMAXPROCS`, enable it via the feature flag `--enable-feature=auto-gomaxprocs`. #3837 * [FEATURE] Add support for limits of silences including the maximum number of active and pending silences, and the maximum size per silence (in bytes). You can use the flags `--silences.max-silences` and `--silences.max-silence-size-bytes` to set them accordingly #3852 #3862 #3866 #3885 #3886 #3877 * [FEATURE] Muted alerts now show whether they are suppressed or not in both the `/api/v2/alerts` endpoint and the Alertmanager UI. #3793 #3797 #3792 * [ENHANCEMENT] Add support for `content`, `username` and `avatar_url` in the Discord integration. `content` and `username` also support templating. #4007 * [ENHANCEMENT] Only invalidate the silences cache if a new silence is created or an existing silence replaced - should improve latency on both `GET api/v2/alerts` and `POST api/v2/alerts` API endpoint. #3961 * [ENHANCEMENT] Add image source label to Dockerfile. To get changelogs shown when using Renovate #4062 * [ENHANCEMENT] Build using go 1.23 #4071 * [ENHANCEMENT] Support setting a global SMTP TLS configuration. #3732 * [ENHANCEMENT] The setting `room_id` in the WebEx integration can now be templated to allow for dynamic room IDs. #3801 * [ENHANCEMENT] Enable setting `message_thread_id` for the Telegram integration. #3638 * [ENHANCEMENT] Support the `since` and `humanizeDuration` functions to templates. This means users can now format time to more human-readable text. #3863 * [ENHANCEMENT] Support the `date` and `tz` functions to templates. This means users can now format time in a specified format and also change the timezone to their specific locale. #3812 * [ENHANCEMENT] Latency metrics now support native histograms. #3737 * [BUGFIX] Fix the SMTP integration not correctly closing an SMTP submission, which may lead to unsuccessful dispatches being marked as successful. #4006 * [BUGFIX] The `ParseMode` option is now set explicitly in the Telegram integration. If we don't HTML tags had not been parsed by default. #4027 * [BUGFIX] Fix a memory leak that was caused by updates silences continuously. #3930 * [BUGFIX] Fix hiding secret URLs when the URL is incorrect. #3887 * [BUGFIX] Fix a race condition in the alerts - it was more of a hypothetical race condition that could have occurred in the alert reception pipeline. #3648 * [BUGFIX] Fix a race condition in the alert delivery pipeline that would cause a firing alert that was delivered earlier to be deleted from the aggregation group when instead it should have been delivered again. #3826 * [BUGFIX] Fix version in APIv1 deprecation notice. #3815 * [BUGFIX] Fix crash errors when using `url_file` in the Webhook integration. #3800 * [BUGFIX] fix `Route.ID()` returns conflicting IDs. #3803 * [BUGFIX] Fix deadlock on the alerts memory store. #3715 * [BUGFIX] Fix `amtool template render` when using the default values. #3725 * [BUGFIX] Fix `webhook_url_file` for both the Discord and Microsoft Teams integrations. #3728 #3745 --------- Signed-off-by: SuperQ <[email protected]> Signed-off-by: gotjosh <[email protected]> Co-authored-by: gotjosh <[email protected]>
This commit fixes a bug where the MaxSilences limit can cause an incomplete update of existing silences, where the old silence can be expired but the new silence cannot be created because it would exceeded the maximum number of silences.