Skip to content

Commit

Permalink
Merge branch 'main' into add-mimir-continoustest-alerts
Browse files Browse the repository at this point in the history
  • Loading branch information
QuentinBisson authored Nov 12, 2024
2 parents d6b4ace + de44204 commit 13df703
Show file tree
Hide file tree
Showing 81 changed files with 2,276 additions and 859 deletions.
40 changes: 39 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,41 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Removed

- Remove the `mimir.enabled` property to replace it with the MC flavor as all CAPI MCs now run Mimir.

## [4.24.1] - 2024-11-12

### Fixed

- Fix `MonitoringAgentDown` to page when both prometheus-agent and alloy-metrics jobs are missing.

## [4.24.0] - 2024-11-12

### Added

- Add a set of sensible alerts to monitor alloy.
- `AlloySlowComponentEvaluations` and `AlloyUnhealthyComponents` to report about alloy component state.
- `LoggingAgentDown` to be alerted when the logging agent is down.
- `LogForwardingErrors` to be alerted when the `loki.write` component is failing.
- `LogReceivingErrors` to be alerted when the `loki.source.api` components of the gateway is failing.
- `MonitoringAgentDown` to be alerted when the monitoring agent is down.
- `MonitoringAgentShardsNotSatisfied` to be alerted when the monitoring agent is missing any number of desired shards.

### Changed

- Update `DeploymentNotSatisfiedAtlas` to take into account the following components:
- `observability-operator`
- `alloy-rules`
- `observability-gateway`
- Move all `grafana-cloud` related alerts to their own file.
- Move all alloy related alerts to the alloy alert file.
- Rename and move the following alerts as they are not specific to Prometheus:
- `PrometheusCriticalJobScrapingFailure` => `CriticalJobScrapingFailure`
- `PrometheusJobScrapingFailure` => `JobScrapingFailure`
- `PrometheusFailsToCommunicateWithRemoteStorageAPI` => `MetricForwardingErrors`

## [4.23.0] - 2024-10-30

### Changed
Expand All @@ -19,6 +54,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Fixed

- Fixes the statefulset.rules name as it is currently replacing the deployment.rules alerts.
- Extends AppCR-related alerts with cancelation for CAPI clusters with unavailable control plane.

## [4.22.0] - 2024-10-29

Expand Down Expand Up @@ -3190,7 +3226,9 @@ Fix `PromtailRequestsErrors` alerts as promtail retries after some backoff so ac

- Add existing rules from https://github.com/giantswarm/prometheus-meta-operator/pull/637/commits/bc6a26759eb955de92b41ed5eb33fa37980660f2

[Unreleased]: https://github.com/giantswarm/prometheus-rules/compare/v4.23.0...HEAD
[Unreleased]: https://github.com/giantswarm/prometheus-rules/compare/v4.24.1...HEAD
[4.24.1]: https://github.com/giantswarm/prometheus-rules/compare/v4.24.0...v4.24.1
[4.24.0]: https://github.com/giantswarm/prometheus-rules/compare/v4.23.0...v4.24.0
[4.23.0]: https://github.com/giantswarm/prometheus-rules/compare/v4.22.0...v4.23.0
[4.22.0]: https://github.com/giantswarm/prometheus-rules/compare/v4.21.1...v4.22.0
[4.21.1]: https://github.com/giantswarm/prometheus-rules/compare/v4.21.0...v4.21.1
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,11 +168,11 @@ There are 2 kinds of tests on rules:
```
[...]
### Testing platform/atlas/alerting-rules/prometheus-operator.rules.yml
### promtool check rules /home/marie/github-repo/prometheus-rules/test/hack/output/generated/capi/capa-mimir/platform/atlas/alerting-rules/prometheus-operator.rules.yml
### promtool check rules /home/marie/github-repo/prometheus-rules/test/hack/output/generated/capi/capa/platform/atlas/alerting-rules/prometheus-operator.rules.yml
### Skipping platform/atlas/alerting-rules/prometheus-operator.rules.yml: listed in test/conf/promtool_ignore
### Testing platform/atlas/alerting-rules/prometheus.rules.yml
### promtool check rules /home/marie/github-repo/prometheus-rules/test/hack/output/generated/capi/capa-mimir/platform/atlas/alerting-rules/prometheus.rules.yml
### promtool test rules prometheus.rules.test.yml - capi/capa-mimir
### promtool check rules /home/marie/github-repo/prometheus-rules/test/hack/output/generated/capi/capa/platform/atlas/alerting-rules/prometheus.rules.yml
### promtool test rules prometheus.rules.test.yml - capi/capa
[...]
09:06:29 promtool: end (Elapsed time: 1s)
Congratulations! Prometheus rules have been promtool checked and tested
Expand Down
2 changes: 1 addition & 1 deletion helm/prometheus-rules/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ home: https://github.com/giantswarm/prometheus-rules
icon: https://s.giantswarm.io/app-icons/1/png/default-app-light.png
name: prometheus-rules
appVersion: '0.1.0'
version: '4.23.0'
version: '4.24.1'
annotations:
application.giantswarm.io/team: "atlas"
config.giantswarm.io/version: 1.x.x
2 changes: 1 addition & 1 deletion helm/prometheus-rules/templates/alloy-rules-configmap.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "capi" }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down
2 changes: 1 addition & 1 deletion helm/prometheus-rules/templates/alloy-rules.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "capi" }}
apiVersion: application.giantswarm.io/v1alpha1
kind: App
metadata:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "workload_cluster"
{{- end }}
{{- end }}
name: aws-load-balancer-controller.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "workload_cluster"
{{- end }}
{{- end }}
name: node.aws.workload-cluster.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "workload_cluster"
{{- end }}
name: aws.workload-cluster.rules
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "management_cluster"
{{- end }}
{{- end }}
name: capa.management-cluster.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ kind: PrometheusRule
metadata:
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "management_cluster"
{{- end }}
{{- end }}
name: irsa.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "management_cluster"
{{- end }}
{{- end }}
name: apiserver.management-cluster.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "workload_cluster"
{{- end }}
{{- end }}
name: apiserver.workload-cluster.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ kind: PrometheusRule
metadata:
labels:
{{- include "labels.common" . | nindent 4}}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "management_cluster"
{{- end }}
{{- end }}
name: capi.management-cluster.rules
namespace: {{.Values.namespace}}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "management_cluster"
{{- end }}
{{- end }}
name: certificate.management-cluster.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "workload_cluster"
{{- end }}
{{- end }}
name: certificate.workload-cluster.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "workload_cluster"
{{- end }}
{{- end }}
name: cluster-autoscaler.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "management_cluster"
{{- end }}
{{- end }}
name: etcd.management-cluster.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "workload_cluster"
{{- end }}
{{- end }}
name: etcd.workload-cluster.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "management_cluster"
{{- end }}
{{- end }}
name: etcdbackup.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "management_cluster"
{{- end }}
{{- end }}
name: inhibit.nodes.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "management_cluster"
{{- end }}
{{- end }}
name: management-cluster.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "management_cluster"
{{- end }}
{{- end }}
name: node.management-cluster.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "workload_cluster"
{{- end }}
{{- end }}
name: node.workload-cluster.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "workload_cluster"
{{- end }}
name: pods.core.rules
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "management_cluster"
{{- end }}
{{- end }}
name: core.storage.management-cluster.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
{{- if not .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
cluster_type: "workload_cluster"
{{- end }}
{{- end }}
name: core.storage.workload-cluster.rules
namespace: {{ .Values.namespace }}
spec:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if .Values.mimir.enabled }}
{{- if eq .Values.managementCluster.provider.flavor "capi" }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down
Loading

0 comments on commit 13df703

Please sign in to comment.