-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable Alerts for Prometheus Remote Writes #146
Disable Alerts for Prometheus Remote Writes #146
Conversation
Signed-off-by: Arjun Naik <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #146 +/- ##
==========================================
+ Coverage 64.23% 64.46% +0.22%
==========================================
Files 8 8
Lines 467 470 +3
==========================================
+ Hits 300 303 +3
Misses 153 153
Partials 14 14
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@cblecker @lisa @jewzaam this PR is to silence prometheus alerts for remote writes to OSD's Observatorium. These alerts haven't been seen before because remote write is not enabled on clusters. As the Observatorium team is scaling up to handle the load we want to initially silence these alerts to ensure we don't fatigue SRE. We have a plan to reenable these alerts https://issues.redhat.com/browse/OSD-6709 |
Before removing hold please add a card tracking undoing this with condition being Observatorium has an SLA. We care about these metrics for reporting status of clusters via OCM at a minimum. Once the Observatorium service is better supported we need to care about when it's offline, at least so we can ship those alerts to the Observatorium team. /hold |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: arjunrn, jewzaam, jharrington22 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Guess @jharrington22 provided what I asked for right when I asked. Thanks! Removing hold. /hold cancel |
Remote writes of metrics data to Observatorium can occasionally fail due to an outage. This triggers an alert in-cluster. To prevent alert fatigue let's disable the alerts related to remote writes.