Skip to content

Commit

Permalink
Add alerts to monitor the HelmReleases for cilium and coredns (#1433)
Browse files Browse the repository at this point in the history
* Add alerts to monitor the HelmReleases for cilium and coredns

* Apply suggestions from code review

Co-authored-by: Gerald Pape <[email protected]>

* Fix spacing

---------

Co-authored-by: Gerald Pape <[email protected]>
  • Loading branch information
fiunchinho and ubergesundheit authored Nov 20, 2024
1 parent 969031c commit 92f843f
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 2 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- Add alert to monitor the HelmRelease for vertical-pod-autoscaler-crd app.
- Add alerts to monitor the `HelmReleases` for `cilium` and `coredns`.
- Add alert to monitor the `HelmRelease` for the `vertical-pod-autoscaler-crd` app.

## [4.26.1] - 2024-11-19

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,4 +59,22 @@ spec:
severity: page
team: cabbage
topic: cilium

{{- if eq .Values.managementCluster.provider.flavor "capi" }}
- alert: FluxHelmReleaseFailed
annotations:
description: |-
{{`Flux HelmRelease {{ $labels.name }} in ns {{ $labels.exported_namespace }} on {{ $labels.installation }}-{{ $labels.cluster_id }} is stuck in Failed state.`}}
opsrecipe: fluxcd-failing-helmrelease/
expr: gotk_reconcile_condition{type="Ready", status="False", kind="HelmRelease", cluster_type="management_cluster", exported_namespace!="flux-giantswarm", name=~".*(cilium|network-policies)"} > 0
for: 20m
labels:
area: platform
cancel_if_outside_working_hours: "true"
cancel_if_kube_state_metrics_down: "true"
cancel_if_monitoring_agent_down: "true"
severity: page
team: cabbage
topic: cilium
namespace: |-
{{`{{ $labels.exported_namespace }}`}}
{{- end -}}
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,25 @@ spec:
severity: page
team: cabbage
topic: dns
{{- if eq .Values.managementCluster.provider.flavor "capi" }}
- alert: FluxHelmReleaseFailed
annotations:
description: |-
{{`Flux HelmRelease {{ $labels.name }} in ns {{ $labels.exported_namespace }} on {{ $labels.installation }}-{{ $labels.cluster_id }} is stuck in Failed state.`}}
opsrecipe: fluxcd-failing-helmrelease/
expr: gotk_reconcile_condition{type="Ready", status="False", kind="HelmRelease", cluster_type="management_cluster", exported_namespace!="flux-giantswarm", name=~".*coredns"} > 0
for: 20m
labels:
area: platform
cancel_if_outside_working_hours: "true"
cancel_if_kube_state_metrics_down: "true"
cancel_if_monitoring_agent_down: "true"
severity: page
team: cabbage
topic: dns
namespace: |-
{{`{{ $labels.exported_namespace }}`}}
{{- end }}
- alert: CoreDNSMaxHPAReplicasReached
expr: |
(
Expand Down

0 comments on commit 92f843f

Please sign in to comment.