Releases: giantswarm/prometheus-rules
Releases · giantswarm/prometheus-rules
v4.30.0
v4.29.0
Changed
- Increase time to trigger
PromtailRequestsErrors
alert from 15 to 25m.
v4.28.0
Added
- Add alert to monitor the
KubeadmConfig
CRs having trouble generating bootstrap data.
Changed
- Ignore HelmReleases in e2e test organization namespaces for cabbage
FluxHelmReleaseFailed
(cilium, network-policies, coredns)
v4.27.0
Added
KongProductionDeploymentNotSatisfied
to alert on clusters starting withp
.KongNonProdDeploymentNotSatisfied
to alert on clusters not starting withp
.
Removed
- Split
KongDeploymentNotSatisfied
intoKongProductionDeploymentNotSatisfied
andKongNonProdDeploymentNotSatisfied
to be able to control alerting in- and outside business hours.
v4.26.2
Changed
- Remove
label_replace
fromapp_operator_app_info
based alerts and use thecluster_id
from the metric on CAPI.
Added
- Add
cloud-provider-controller.rules
to monitor the cloud-provider-controller components across providers. - Add alerts to monitor the
HelmReleases
forcilium
andcoredns
. - Add alert to monitor the
HelmRelease
for thevertical-pod-autoscaler-crd
app. - Add alert to monitor Shield pods restarts.
- Add
MimirRulerTooManyFailedQueries
alert to detect when Mimir ruler is failing to evaluate rules
Fixed
- Fix dashboard link for
MimirContinuousTestFailing
alert - Fix tests so they fail if some helm template fails to render
v4.26.1
Changed
- MimirObjectStorageLowRate and LokiObjectStorageLowRate only check management cluster apps
- MimirObjectStorageLowRate and LokiObjectStorageLowRate are less sensitive
v4.26.0
Changed
- Bump alloy-rules app version to 0.7.0
- Upgrades alloy to 1.4.2 to 1.5.0
Added
- new MimirObjectStorageLowRate alert
- new LokiObjectStorageLowRate alert
v4.25.0
Changed
- Mimir compactor alert: better failure detection
Added
- Add new mimir continuous test alerts:
MimirContinuousTestFailingOnWrites
MimirContinuousTestFailingOnReads
MimirContinuousTestMissing
MimirContinuousTestFailing
Removed
- Remove the
mimir.enabled
property to replace it with the MC flavor as all CAPI MCs now run Mimir.
v4.24.1
Fixed
- Fix
MonitoringAgentDown
to page when both prometheus-agent and alloy-metrics jobs are missing.
v4.24.0
Added
- Add a set of sensible alerts to monitor alloy.
AlloySlowComponentEvaluations
andAlloyUnhealthyComponents
to report about alloy component state.LoggingAgentDown
to be alerted when the logging agent is down.LogForwardingErrors
to be alerted when theloki.write
component is failing.LogReceivingErrors
to be alerted when theloki.source.api
components of the gateway is failing.MonitoringAgentDown
to be alerted when the monitoring agent is down.MonitoringAgentShardsNotSatisfied
to be alerted when the monitoring agent is missing any number of desired shards.
Changed
- Update
DeploymentNotSatisfiedAtlas
to take into account the following components:observability-operator
alloy-rules
observability-gateway
- Move all
grafana-cloud
related alerts to their own file. - Move all alloy related alerts to the alloy alert file.
- Rename and move the following alerts as they are not specific to Prometheus:
PrometheusCriticalJobScrapingFailure
=>CriticalJobScrapingFailure
PrometheusJobScrapingFailure
=>JobScrapingFailure
PrometheusFailsToCommunicateWithRemoteStorageAPI
=>MetricForwardingErrors