-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional monitoring rules to the PrometheusRule #791
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
Signed-off-by: Gerald Nunn <[email protected]>
Signed-off-by: Gerald Nunn <[email protected]>
…ased on this and thus should not change willy-nilly Signed-off-by: Gerald Nunn <[email protected]>
Signed-off-by: Gerald Nunn <[email protected]>
Signed-off-by: Gerald Nunn <[email protected]>
@gnunn1: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
/kind enhancement
What does this PR do / why we need it:
This PR provides additional rules for alerting, specifically it captures the following situations:
This helps users better monitor Argo CD Applications using the built-in OpenShift monitoring stack.
A couple of additional comments:
The progressing for more then 10 minutes might ruffle some feathers since the Health check for Subscriptions leaves it in a Progressing state rather then Suspended. I'm working on adjusting the health check for upstream but it's not there yet. Note the alert can be silenced if customers find it annoying, we could also lower the severity to info.
I chose to make Unknown for Sync State critical since it means the Application is not syncing properly. However if folks feel like this is too high it can be dropped down to warning. If we do this it can be combined with the ArgoCDSyncAlert since they would share the same severity.
I wanted to change the name of ArgoCDSyncAlert to ArgoCDOutOfSyncAlert but realized that customers may have monitoring and configuration depending on this name so I have left it the same as now.
Have you updated the necessary documentation?
The documentation does not mention specific alerts AFAIK so I do not feel like it needs to be covered. However this should be included in the release notes.
Which issue(s) this PR fixes:
https://issues.redhat.com/browse/GITOPS-4873
Test acceptance criteria:
Updated unit tests however I wonder if the way I'm doing it could be improved by parameterizing the MonitoringRules and then having both the code and unit tests share the same definitions?
How to test changes / Special notes to the reviewer:
Deploy applications with bad sync and health statues and verify that OpenShift Alerts are triggering after the alert duration expires.