Skip to content

Releases: giantswarm/prometheus-rules

v4.30.0

10 Dec 13:24
bb96c0a
Compare
Choose a tag to compare

Added

  • Add alerts for karpenter issues.

v4.29.0

09 Dec 13:26
b666430
Compare
Choose a tag to compare

Changed

  • Increase time to trigger PromtailRequestsErrors alert from 15 to 25m.

v4.28.0

02 Dec 08:31
b031f50
Compare
Choose a tag to compare

Added

  • Add alert to monitor the KubeadmConfig CRs having trouble generating bootstrap data.

Changed

  • Ignore HelmReleases in e2e test organization namespaces for cabbage FluxHelmReleaseFailed (cilium, network-policies, coredns)

v4.27.0

27 Nov 15:47
fc7e566
Compare
Choose a tag to compare

Added

  • KongProductionDeploymentNotSatisfied to alert on clusters starting with p.
  • KongNonProdDeploymentNotSatisfied to alert on clusters not starting with p.

Removed

  • Split KongDeploymentNotSatisfied into KongProductionDeploymentNotSatisfied and KongNonProdDeploymentNotSatisfied to be able to control alerting in- and outside business hours.

v4.26.2

27 Nov 10:52
97c68f6
Compare
Choose a tag to compare

Changed

  • Remove label_replace from app_operator_app_info based alerts and use the cluster_id from the metric on CAPI.

Added

  • Add cloud-provider-controller.rules to monitor the cloud-provider-controller components across providers.
  • Add alerts to monitor the HelmReleases for cilium and coredns.
  • Add alert to monitor the HelmRelease for the vertical-pod-autoscaler-crd app.
  • Add alert to monitor Shield pods restarts.
  • Add MimirRulerTooManyFailedQueries alert to detect when Mimir ruler is failing to evaluate rules

Fixed

  • Fix dashboard link for MimirContinuousTestFailing alert
  • Fix tests so they fail if some helm template fails to render

v4.26.1

19 Nov 12:59
4ac59b0
Compare
Choose a tag to compare

Changed

  • MimirObjectStorageLowRate and LokiObjectStorageLowRate only check management cluster apps
  • MimirObjectStorageLowRate and LokiObjectStorageLowRate are less sensitive

v4.26.0

19 Nov 08:56
6b82c35
Compare
Choose a tag to compare

Changed

  • Bump alloy-rules app version to 0.7.0
    • Upgrades alloy to 1.4.2 to 1.5.0

Added

  • new MimirObjectStorageLowRate alert
  • new LokiObjectStorageLowRate alert

v4.25.0

18 Nov 08:42
e8a7dee
Compare
Choose a tag to compare

Changed

  • Mimir compactor alert: better failure detection

Added

  • Add new mimir continuous test alerts:
    • MimirContinuousTestFailingOnWrites
    • MimirContinuousTestFailingOnReads
    • MimirContinuousTestMissing
    • MimirContinuousTestFailing

Removed

  • Remove the mimir.enabled property to replace it with the MC flavor as all CAPI MCs now run Mimir.

v4.24.1

12 Nov 09:46
f01631d
Compare
Choose a tag to compare

Fixed

  • Fix MonitoringAgentDown to page when both prometheus-agent and alloy-metrics jobs are missing.

v4.24.0

12 Nov 07:54
bf0d4f5
Compare
Choose a tag to compare

Added

  • Add a set of sensible alerts to monitor alloy.
    • AlloySlowComponentEvaluations and AlloyUnhealthyComponents to report about alloy component state.
    • LoggingAgentDown to be alerted when the logging agent is down.
    • LogForwardingErrors to be alerted when the loki.write component is failing.
    • LogReceivingErrors to be alerted when the loki.source.api components of the gateway is failing.
    • MonitoringAgentDown to be alerted when the monitoring agent is down.
    • MonitoringAgentShardsNotSatisfied to be alerted when the monitoring agent is missing any number of desired shards.

Changed

  • Update DeploymentNotSatisfiedAtlas to take into account the following components:
    • observability-operator
    • alloy-rules
    • observability-gateway
  • Move all grafana-cloud related alerts to their own file.
  • Move all alloy related alerts to the alloy alert file.
  • Rename and move the following alerts as they are not specific to Prometheus:
    • PrometheusCriticalJobScrapingFailure => CriticalJobScrapingFailure
    • PrometheusJobScrapingFailure => JobScrapingFailure
    • PrometheusFailsToCommunicateWithRemoteStorageAPI => MetricForwardingErrors