Skip to content

Commit

Permalink
Add alert for failing helmreleases deploying aws components (#1432)
Browse files Browse the repository at this point in the history
* Add alert for failing helmreleases deploying aws components

* Add alerts for azure cloud components HelmReleases

* Use generic alert for all providers
  • Loading branch information
fiunchinho authored Nov 21, 2024
1 parent 92f843f commit e4a5df4
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- Add `cloud-provider-controller.rules` to monitor the cloud-provider-controller components across providers.
- Add alerts to monitor the `HelmReleases` for `cilium` and `coredns`.
- Add alert to monitor the `HelmRelease` for the `vertical-pod-autoscaler-crd` app.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{{- if eq .Values.managementCluster.provider.flavor "capi" }}
# This rule applies to CAPI management clusters only
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
name: cloud-provider-controller.rules
namespace: {{ .Values.namespace }}
spec:
groups:
- name: cloud-provider-controller
rules:
- alert: FluxHelmReleaseFailed
annotations:
description: |-
{{`Flux HelmRelease {{ $labels.name }} in ns {{ $labels.exported_namespace }} on {{ $labels.installation }}/{{ $labels.cluster_id }} is stuck in Failed state.`}}
opsrecipe: fluxcd-failing-helmrelease/
expr: gotk_reconcile_condition{type="Ready", status="False", kind="HelmRelease", cluster_type="management_cluster", exported_namespace!="flux-giantswarm", name=~".*(aws-ebs-csi-driver|cloud-provider-aws|azure-cloud-controller-manager|azure-cloud-node-manager|azuredisk-csi-driver|azurefile-csi-driver|cloud-provider-vsphere|cloud-provider-cloud-director)"} > 0
for: 20m
labels:
area: kaas
cancel_if_outside_working_hours: "true"
cancel_if_kube_state_metrics_down: "true"
cancel_if_monitoring_agent_down: "true"
severity: page
team: {{ include "providerTeam" . }}
topic: managementcluster
namespace: |-
{{`{{ $labels.exported_namespace }}`}}
{{- end }}

0 comments on commit e4a5df4

Please sign in to comment.