Skip to content

Commit

Permalink
Add alerts to monitor the HelmReleases for cilium and coredns
Browse files Browse the repository at this point in the history
  • Loading branch information
fiunchinho committed Nov 19, 2024
1 parent 4ac59b0 commit 14b403c
Show file tree
Hide file tree
Showing 3 changed files with 38 additions and 1 deletion.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- Add alerts to monitor the `HelmReleases` for `cilium` and `coredns`.

## [4.26.1] - 2024-11-19

### Changed
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,4 +59,20 @@ spec:
severity: page
team: cabbage
topic: cilium

- alert: FluxHelmReleaseFailed
annotations:
description: |-
{{`Flux HelmRelease {{ $labels.name }} in ns {{ $labels.exported_namespace }} on {{ $labels.installation }}/{{ $labels.cluster_id }} is stuck in Failed state.`}}
opsrecipe: fluxcd-failing-helmrelease/
expr: gotk_reconcile_condition{type="Ready", status="False", kind="HelmRelease", cluster_type="management_cluster", exported_namespace!="flux-giantswarm", name=~".*(cilium|network-policies)"} > 0
for: 20m
labels:
area: platform
cancel_if_outside_working_hours: "true"
cancel_if_kube_state_metrics_down: "true"
cancel_if_monitoring_agent_down: "true"
severity: page
team: cabbage
topic: cilium
namespace: |-
{{`{{ $labels.exported_namespace }}`}}
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,23 @@ spec:
severity: page
team: cabbage
topic: dns
- alert: FluxHelmReleaseFailed
annotations:
description: |-
{{`Flux HelmRelease {{ $labels.name }} in ns {{ $labels.exported_namespace }} on {{ $labels.installation }}/{{ $labels.cluster_id }} is stuck in Failed state.`}}
opsrecipe: fluxcd-failing-helmrelease/
expr: gotk_reconcile_condition{type="Ready", status="False", kind="HelmRelease", cluster_type="management_cluster", exported_namespace!="flux-giantswarm", name=~".*coredns"} > 0
for: 20m
labels:
area: platform
cancel_if_outside_working_hours: "true"
cancel_if_kube_state_metrics_down: "true"
cancel_if_monitoring_agent_down: "true"
severity: page
team: cabbage
topic: dns
namespace: |-
{{`{{ $labels.exported_namespace }}`}}
- alert: CoreDNSMaxHPAReplicasReached
expr: |
(
Expand Down

0 comments on commit 14b403c

Please sign in to comment.