Skip to content

Commit

Permalink
Adjust alert timing and severity
Browse files Browse the repository at this point in the history
Currently, the alert timing is too aggressive and
severity is too high.  An individual machine being
down is not critical, any critical components already
have alerting configured (such as etcd).
  • Loading branch information
michaelgugino committed Aug 12, 2020
1 parent ea1b907 commit 730e92a
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions install/0000_90_machine-api-operator_04_alertrules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,19 @@ spec:
- alert: MachineWithoutValidNode
expr: |
(mapi_machine_created_timestamp_seconds unless on(node) kube_node_info) > 0
for: 10m
for: 60m
labels:
severity: critical
severity: warning
annotations:
message: "machine {{ $labels.name }} does not have valid node reference"
- name: machine-with-no-running-phase
rules:
- alert: MachineWithNoRunningPhase
expr: |
(mapi_machine_created_timestamp_seconds{phase!="Running"}) > 0
for: 10m
for: 60m
labels:
severity: critical
severity: warning
annotations:
message: "machine {{ $labels.name }} is in phase: {{ $labels.phase }}"
- name: machine-api-operator-metrics-collector-up
Expand Down

0 comments on commit 730e92a

Please sign in to comment.