Skip to content

Commit

Permalink
Add mimir.enabled flag
Browse files Browse the repository at this point in the history
  • Loading branch information
QuentinBisson committed Mar 28, 2024
1 parent 31301ab commit 12e81b2
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 5 deletions.
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if .Values.mimir.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
Expand All @@ -16,14 +17,11 @@ spec:
expr: up{app="mimir"} > 0
labels:
area: "empowerment"
# TODO(@team-atlas): do we need this label? Let's test once we use mimir alertmanager
installation: {{ .Values.managementCluster.name }}
# TODO(@team-atlas): We need this label as long as we have the old and new heartbeats. Let's remove once the legacy monitoring is gone
type: "mimir-heartbeat"
team: "atlas"
topic: "observability"
# TODO(@team-atlas): do we need this label? Let's test once we use mimir alertmanager
type: "heartbeat"
# TODO(@team-atlas): remove once we use mimir alertmanager
namespace: "monitoring" # Needed due to https://github.com/prometheus-operator/prometheus-operator/issues/3737
# Coming from https://github.com/giantswarm/giantswarm/issues/30124
# This alert ensures Mimir containers are not restarting too often (flappiness).
# If it is not the the case, this can incur high costs by cloud providers (s3 api calls are quite expensive).
Expand Down Expand Up @@ -86,3 +84,4 @@ spec:
severity: page
team: atlas
topic: observability
{{- end }}
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ spec:
groups:
- name: observability
rules:
{{- if not .Values.mimir.enabled }}
- alert: "Heartbeat"
expr: up{app="prometheus",instance!="prometheus-agent"}
labels:
Expand All @@ -20,6 +21,7 @@ spec:
namespace: "monitoring" # Needed due to https://github.com/prometheus-operator/prometheus-operator/issues/3737
annotations:
description: This alert is used to ensure the entire alerting pipeline is functional.
{{- end }}
- alert: "MatchingNumberOfPrometheusAndCluster"
annotations:
description: This alert is used to ensure we have as many workload cluster prometheus as we have workload cluster CR.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if .Values.mimir.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
Expand Down Expand Up @@ -577,3 +578,4 @@ spec:
- expr: |
sum by(cluster, namespace, pod) (rate(cortex_ingester_ingested_samples_total[1m]))
record: cluster_namespace_pod:cortex_ingester_ingested_samples_total:rate1m
{{- end }}

0 comments on commit 12e81b2

Please sign in to comment.