Skip to content

Commit

Permalink
Add LokiMissingSpotEntries and LokiMissingWebsocketEntries alerts
Browse files Browse the repository at this point in the history
  • Loading branch information
TheoBrigitte committed Nov 22, 2024
1 parent 428041b commit 3f9271e
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 0 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Add `cloud-provider-controller.rules` to monitor the cloud-provider-controller components across providers.
- Add alerts to monitor the `HelmReleases` for `cilium` and `coredns`.
- Add alert to monitor the `HelmRelease` for the `vertical-pod-autoscaler-crd` app.
- Add `LokiMissingSpotEntries` alert to monitor missing spot entries in Loki.
- Add `LokiMissingWebsocketEntries` alert to monitor missing websocket entries in Loki.

### Fixed

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,53 @@ spec:
severity: page
team: atlas
topic: observability
- alert: LokiMissingSpotEntries
annotations:
dashboard: loki-canary/loki-canary
description: This alert checks that loki is not missing canary spot entries
opsrecipe: loki/
expr: |
(
sum by (cluster_id, pod, installation, pipeline, provider)
(increase(loki_canary_spot_check_missing_entries_total{cluster_type="management_cluster",namespace="loki"}[5m]))
/
sum by (cluster_id, pod, installation, pipeline, provider)
(increase(loki_canary_spot_check_entries_total{cluster_type="management_cluster",namespace="loki"}[5m]))
) > 0
for: 30m
labels:
area: platform
cancel_if_cluster_control_plane_unhealthy: "true"
cancel_if_cluster_status_creating: "true"
cancel_if_cluster_status_deleting: "true"
cancel_if_cluster_status_updating: "true"
cancel_if_outside_working_hours: "true"
severity: page
team: atlas
topic: observability
- alert: LokiMissingWebsocketEntries
annotations:
dashboard: loki-canary/loki-canary
description: This alert checks that loki is not missing canary websocket entries
opsrecipe: loki/
expr: |
sum by (cluster_id, pod, installation, pipeline, provider)
(increase(loki_canary_websocket_missing_entries_total{cluster_type="management_cluster",namespace="loki"}[5m]))
/
sum by (cluster_id, pod, installation, pipeline, provider)
(increase(loki_canary_entries_total{cluster_type="management_cluster",namespace="loki"}[5m]))
) > 0
for: 30m
labels:
area: platform
cancel_if_cluster_control_plane_unhealthy: "true"
cancel_if_cluster_status_creating: "true"
cancel_if_cluster_status_deleting: "true"
cancel_if_cluster_status_updating: "true"
cancel_if_outside_working_hours: "true"
severity: page
team: atlas
topic: observability
- alert: LokiObjectStorageLowRate
annotations:
dashboard: loki-operational/loki-operational
Expand Down

0 comments on commit 3f9271e

Please sign in to comment.