Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send PrometheusNotIngestingSamples for openshift-user-workload-monitoring to null receiver #142

Conversation

fahlmant
Copy link
Contributor

@fahlmant fahlmant commented Feb 23, 2021

In 4.6, this alert fires by default as UWM has no scrape targets. Fixed in 4.7 but monitoring team recommended silencing the alerts.
https://issues.redhat.com/browse/OSD-6559

@iamkirkbater
Copy link
Contributor

Love this. +1 from me

@codecov
Copy link

codecov bot commented Feb 23, 2021

Codecov Report

Merging #142 (9e8f3b1) into master (5926812) will increase coverage by 0.07%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #142      +/-   ##
==========================================
+ Coverage   64.16%   64.23%   +0.07%     
==========================================
  Files           8        8              
  Lines         466      467       +1     
==========================================
+ Hits          299      300       +1     
  Misses        153      153              
  Partials       14       14              
Impacted Files Coverage Δ
pkg/controller/secret/secret_controller.go 88.77% <100.00%> (+0.03%) ⬆️

@jharrington22
Copy link
Contributor

@fahlmant how do we remember to revert this when 4.7 comes around?

@fahlmant
Copy link
Contributor Author

@jharrington22 We can make a card and put it in the backlog. We'll have to wait until 4.6 is all gone so it will be a while.

@jharrington22
Copy link
Contributor

@jharrington22 We can make a card and put it in the backlog. We'll have to wait until 4.6 is all gone so it will be a while.

Lets add a card and I'll +1 this.

@@ -148,6 +148,9 @@ func createPagerdutyRoute() *alertmanager.Route {
// https://issues.redhat.com/browse/OSD-6327
{Receiver: receiverNull, Match: map[string]string{"alertname": "CannotRetrieveUpdates"}},

//https://issues.redhat.com/browse/OSD-6559
{Receiver: receiverNull, Match: map[string]string{"alertname": "PrometheusNotIngestingSamples", "namespace": "openshift-user-workload-monitoring"}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we get any alerts for this namespace? I expect we need to make sure UWM is up, but is there a specific set of alerts we should allow and discard the rest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're responsible for ensuring the resources in this namespace (prom pods, thanos pods) are running and working properly. Maybe we can make a list of Alerts we care about and scope it to that. For the short term, this will generate noise for every deployment by default, so we should silence this, then reevaluate other alerts from this NS IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Let's card up something to address this more long term but understood there's a tactical need.
/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 23, 2021
@fahlmant
Copy link
Contributor Author

fahlmant commented Feb 23, 2021

@jharrington22
Copy link
Contributor

/lgtm

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fahlmant, jewzaam, jharrington22

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 23, 2021
@openshift-merge-robot openshift-merge-robot merged commit 165acb6 into openshift:master Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants