Skip to content

Commit

Permalink
Check creation CAPI cluster creation time on `LatestETCDBackup2DaysOl…
Browse files Browse the repository at this point in the history
…d` (#1015)

* check creation CAPI cluster creation time on `LatestETCDBackup2DaysOld`

* Update CHANGELOG.md
  • Loading branch information
njuettner authored Jan 25, 2024
1 parent 5082711 commit 500b38d
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 4 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- Check creation CAPI cluster creation time before paging `LatestETCDBackup2DaysOld`.

### Changed

- Rename `dipstick` report count metric.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,21 +23,23 @@ spec:
severity: page
team: {{ include "providerTeam" . }}
topic: etcd
- alert: LatestETCDBackup1DayOld
{{- if eq .Values.managementCluster.provider.flavor "capi" }}
- alert: LatestETCDBackup2DaysOld
annotations:
description: '{{`Latest successfull ETCD backup for {{ $labels.cluster_id }}/{{ $labels.tenant_cluster_id }} was more than 24h ago.`}}'
description: '{{`Latest successfull ETCD backup for {{ $labels.cluster_id }}/{{ $labels.tenant_cluster_id }} was more than 48h ago.`}}'
opsrecipe: etcd-backup-failed/
expr: (time() - etcd_backup_latest_success{tenant_cluster_id!="Control Plane"}) > 24 * 60 * 60 and (time() - etcd_backup_latest_success{tenant_cluster_id!="Control Plane"}) < 48 * 60 * 60
expr: count(label_replace(capi_cluster_created, "tenant_cluster_id", "$1", "name", "(.*)")) by (tenant_cluster_id) > 48 * 60 * 60 unless count((time() - etcd_backup_latest_success{tenant_cluster_id!="Control Plane"}) > 48 * 60 * 60) by (tenant_cluster_id)
for: 5m
labels:
area: kaas
cancel_if_cluster_status_creating: "true"
cancel_if_cluster_status_deleting: "true"
cancel_if_cluster_status_updating: "true"
cancel_if_outside_working_hours: "true"
severity: notify
severity: page
team: {{ include "providerTeam" . }}
topic: etcd-backup
{{- else }}
- alert: LatestETCDBackup2DaysOld
annotations:
description: '{{`Latest successfull ETCD backup for {{ $labels.cluster_id }}/{{ $labels.tenant_cluster_id }} was more than 48h ago.`}}'
Expand All @@ -53,6 +55,7 @@ spec:
severity: page
team: {{ include "providerTeam" . }}
topic: etcd-backup
{{- end }}
- alert: ManagementClusterNotBackedUp24h
annotations:
description: '{{`{{ $labels.cluster_id }} management cluster''s ETCD backup was unsuccessful.`}}'
Expand Down

0 comments on commit 500b38d

Please sign in to comment.