Skip to content

Commit

Permalink
rename prometheus-agent inbitions to monitoring agent to be able to t…
Browse files Browse the repository at this point in the history
…ake alloy into account
  • Loading branch information
QuentinBisson committed Oct 29, 2024
1 parent 78fa24f commit 0e7f149
Show file tree
Hide file tree
Showing 32 changed files with 157 additions and 153 deletions.
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Changed

- Rename all `prometheus-agent` related inhibitions to `monitoring-agent` inhibitions.

## [4.21.1] - 2024-10-25

### Fixed
Expand Down Expand Up @@ -1459,7 +1463,7 @@ Fix `PromtailRequestsErrors` alerts as promtail retries after some backoff so ac

- Deprecate `role=master` in favor of `role=control-plane`.
- Rename alerts containing `Master` with `ControlPlane`
- Added "cancel_if_prometheus_agent_down" for phoenix alerts ManagementClusterCriticalPodMetricMissing, ManagementClusterDeploymentMissingAWS, WorkloadClusterNonCriticalDeploymentNotSatisfiedKaas
- Added `cancel_if_prometheus_agent_down` for phoenix alerts ManagementClusterCriticalPodMetricMissing, ManagementClusterDeploymentMissingAWS, WorkloadClusterNonCriticalDeploymentNotSatisfiedKaas

## [2.94.0] - 2023-04-26

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ spec:
labels:
area: kaas
cancel_if_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: page
team: phoenix
Expand All @@ -39,7 +39,7 @@ spec:
labels:
area: kaas
cancel_if_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: page
team: phoenix
Expand All @@ -58,7 +58,7 @@ spec:
labels:
area: kaas
cancel_if_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: page
team: phoenix
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ spec:
for: 15m
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_cluster_status_creating: "true"
cancel_if_cluster_status_deleting: "true"
cancel_if_cluster_status_updating: "true"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ spec:
labels:
area: kaas
cancel_if_outside_working_hours: {{ include "workingHoursOnly" . }}
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
severity: page
team: phoenix
topic: vault
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ spec:
for: 1h
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: page
team: {{ include "providerTeam" . }}
Expand All @@ -29,7 +29,7 @@ spec:
for: 1h
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: notify
team: {{ include "providerTeam" . }}
Expand All @@ -45,7 +45,7 @@ spec:
for: 1h
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: notify
team: {{ include "providerTeam" . }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ spec:
for: 90m
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: notify
team: {{ include "providerTeam" . }}
Expand All @@ -30,7 +30,7 @@ spec:
for: 1h
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: notify
team: {{ include "providerTeam" . }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ spec:
for: 30m
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: page
team: {{ include "providerTeam" . }}
Expand All @@ -29,7 +29,7 @@ spec:
for: 1h
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: notify
team: {{ include "providerTeam" . }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ spec:
for: 15m
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: notify
team: {{ include "providerTeam" . }}
Expand All @@ -29,7 +29,7 @@ spec:
for: 1h
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: notify
team: {{ include "providerTeam" . }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ spec:
for: 15m
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: page
team: {{ include "providerTeam" . }}
Expand All @@ -29,7 +29,7 @@ spec:
for: 1h
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: notify
team: {{ include "providerTeam" . }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ spec:
for: 1h
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: notify
team: {{ include "providerTeam" . }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ spec:
labels:
area: kaas
cancel_if_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: page
team: {{ include "providerTeam" . }}
Expand All @@ -37,7 +37,7 @@ spec:
labels:
area: kaas
cancel_if_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: page
team: {{ include "providerTeam" . }}
Expand All @@ -54,7 +54,7 @@ spec:
labels:
area: kaas
cancel_if_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: page
team: {{ include "providerTeam" . }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ spec:
cancel_if_kubelet_down: "true"
cancel_if_cluster_has_no_workers: "true"
cancel_if_outside_working_hours: "true"
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
severity: page
team: turtles
topic: kubernetes
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ spec:
for: 25m
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_kube_state_metrics_down: "true"
cancel_if_outside_working_hours: {{ include "workingHoursOnly" . }}
severity: page
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ spec:
for: 30m
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: page
team: {{ include "providerTeam" . }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@ metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
name: inhibit.prometheus-agent.rules
name: inhibit.monitoring-agent.rules
namespace: {{ .Values.namespace }}
spec:
groups:
- name: inhibit.prometheus-agent
- name: inhibit.monitoring-agent
rules:
# this inhibition fires when a cluster is not running prometheus-agent.
# we retrieve the list of existing cluster IDs from `kube_namespace_created`
Expand Down Expand Up @@ -38,7 +38,7 @@ spec:
)
) by (cluster_id)
labels:
cluster_is_not_running_prometheus_agent: "true"
cluster_is_not_running_monitoring_agent: "true"
area: platform
team: atlas
topic: monitoring
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ spec:
cancel_if_cluster_status_deleting: "true"
cancel_if_cluster_has_no_workers: "true"
inhibit_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
cancel_if_kubelet_down: "true"
cancel_if_outside_working_hours: {{ include "workingHoursOnly" . }}
severity: page
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ spec:
severity: page
team: atlas
topic: observability
inhibit_prometheus_agent_down: "true"
cancel_if_cluster_is_not_running_prometheus_agent: "true"
inhibit_monitoring_agent_down: "true"
cancel_if_cluster_is_not_running_monitoring_agent: "true"
cancel_if_cluster_status_creating: "true"
cancel_if_cluster_status_deleting: "true"
cancel_if_cluster_has_no_workers: "true"
Expand Down Expand Up @@ -89,8 +89,8 @@ spec:
severity: none
team: atlas
topic: observability
inhibit_prometheus_agent_down: "true"
cancel_if_cluster_is_not_running_prometheus_agent: "true"
inhibit_monitoring_agent_down: "true"
cancel_if_cluster_is_not_running_monitoring_agent: "true"
cancel_if_cluster_status_creating: "true"
cancel_if_cluster_status_deleting: "true"
## Page Atlas if prometheus agent is missing shards to send samples to MC prometheus.
Expand Down Expand Up @@ -119,8 +119,8 @@ spec:
severity: page
team: atlas
topic: observability
inhibit_prometheus_agent_down: "true"
cancel_if_cluster_is_not_running_prometheus_agent: "true"
inhibit_monitoring_agent_down: "true"
cancel_if_cluster_is_not_running_monitoring_agent: "true"
cancel_if_cluster_status_creating: "true"
cancel_if_cluster_status_deleting: "true"
cancel_if_outside_working_hours: "true"
Expand Down Expand Up @@ -150,8 +150,8 @@ spec:
severity: none
team: atlas
topic: observability
inhibit_prometheus_agent_down: "true"
cancel_if_cluster_is_not_running_prometheus_agent: "true"
inhibit_monitoring_agent_down: "true"
cancel_if_cluster_is_not_running_monitoring_agent: "true"
cancel_if_cluster_status_creating: "true"
cancel_if_cluster_status_deleting: "true"
cancel_if_outside_working_hours: "true"
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,6 @@ spec:
team: atlas
topic: observability
cancel_if_outside_working_hours: "true"
cancel_if_cluster_is_not_running_prometheus_agent: "true"
cancel_if_cluster_is_not_running_monitoring_agent: "true"
cancel_if_cluster_status_creating: "true"
cancel_if_cluster_status_deleting: "true"
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ spec:
cancel_if_kubelet_down: "true"
cancel_if_cluster_has_no_workers: "true"
cancel_if_outside_working_hours: {{ include "workingHoursOnly" . }}
cancel_if_prometheus_agent_down: "true"
cancel_if_monitoring_agent_down: "true"
severity: notify
team: honeybadger
topic: releng
Expand Down
Loading

0 comments on commit 0e7f149

Please sign in to comment.