Skip to content

Commit

Permalink
Use generic alert for all providers
Browse files Browse the repository at this point in the history
  • Loading branch information
fiunchinho committed Nov 20, 2024
1 parent 17e2ab2 commit 5ea26c7
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 40 deletions.
3 changes: 1 addition & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- Add `aws-cloud-components.rules` to monitor the AWS cloud-controller and the ebs-csi-driver.
- Add `azure-cloud-components.rules` to monitor the Azure cloud-controller and the azure csi drivers.
- Add `cloud-provider-controller.rules` to monitor the cloud-provider-controller components across providers.
- Add alert to monitor the HelmRelease for vertical-pod-autoscaler-crd app.

## [4.26.1] - 2024-11-19
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,31 +1,42 @@
{{- if eq .Values.managementCluster.provider.kind "capa" }}
# This rule applies to capa management clusters only
{{- if eq .Values.managementCluster.provider.flavor "capi" }}
# This rule applies to CAPI management clusters only
{{- define "cloudProviderControllerComponents" -}}
- aws-ebs-csi-driver
- cloud-provider-aws
- azure-cloud-controller-manager
- azure-cloud-node-manager
- azuredisk-csi-driver
- azurefile-csi-driver
- cloud-provider-vsphere
- cloud-provider-cloud-director
{{- end }}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
name: aws-cloud-components.rules
name: cloud-provider-controller.rules
namespace: {{ .Values.namespace }}
spec:
groups:
- name: aws-cloud-components
- name: cloud-provider-controller
rules:
- alert: FluxHelmReleaseFailed
annotations:
description: |-
{{`Flux HelmRelease {{ $labels.name }} in ns {{ $labels.exported_namespace }} on {{ $labels.installation }}/{{ $labels.cluster_id }} is stuck in Failed state.`}}
opsrecipe: fluxcd-failing-helmrelease/
expr: gotk_reconcile_condition{type="Ready", status="False", kind="HelmRelease", cluster_type="management_cluster", exported_namespace!="flux-giantswarm", name=~".*(aws-ebs-csi-driver|cloud-provider-aws)"} > 0
# Here we take the list of components from the cloudProviderControllerComponents template function and transform it into a |-separated string, which is suitable for the PromQL query
expr: gotk_reconcile_condition{type="Ready", status="False", kind="HelmRelease", cluster_type="management_cluster", exported_namespace!="flux-giantswarm", name=~".*{{ (include "cloudProviderControllerComponents" . | fromYaml | join "\\|") }}"} > 0
for: 20m
labels:
area: kaas
cancel_if_outside_working_hours: "true"
cancel_if_kube_state_metrics_down: "true"
cancel_if_monitoring_agent_down: "true"
severity: page
team: phoenix
team: {{ include "providerTeam" . }}
topic: managementcluster
namespace: |-
{{`{{ $labels.exported_namespace }}`}}
Expand Down

0 comments on commit 5ea26c7

Please sign in to comment.