Skip to content

Releases: giantswarm/prometheus-rules

v4.3.2

21 Jun 07:44
53f5450
Compare
Choose a tag to compare

Added

  • Added new alerting rules to monitor the Prometheus reading data from Mimir and sending them to Grafana Cloud.
  • Recording rule to send mimir memory usage and metrics amount to grafana cloud

v4.3.1

18 Jun 08:56
f99103f
Compare
Choose a tag to compare

Changed

  • Increase time in volume filled related alerts to allow node-problem-detector to shut down nodes properly.

Fixed

  • Fix cert-exporter alerts to render the secret namespace and not the cert-exporter namespace in the alert description.

Removed

  • Remove old kaas daemonset slos as they are now in sloth slos.
  • Remove old cilium daemonset slos as they are now in sloth slos.

v4.3.0

17 Jun 13:06
e8bfcde
Compare
Choose a tag to compare

Removed

  • Remove old cloud-api slos as they are now in sloth slos.
  • Remove old Heartbeat and MatchingNumberOfPrometheusAndCluster on mimir-equipped installations.

v4.2.1

14 Jun 13:29
037b865
Compare
Choose a tag to compare

Fixed

  • removed duplicate slo-target on AWS

Changed

  • Finish reviewing turles alerts for multi-provider MCs and Mimir.
    • Prefix all vintage alerts with vintage to facilitate maintenance.
    • Fix kubelet container runtime alerts.
    • Fix pod_name label to use pod instead.

v4.2.0

13 Jun 14:56
b330f73
Compare
Choose a tag to compare

Added

  • Added a new alerting rule to falco.rules.yml to fire an alert for XZ-backdoor.
  • Added CiliumAPITooSlow.
  • Added CODEOWNERS files.

Changed

  • Restrict grafana-agent-rules CiliumNetworkPolicy.
  • Use ready replicas for Kyverno webhooks alert.
  • Sort out shared alert ownership by distributing them all to teams.
  • Review and fix phoenix alerts towards Mimir and multi-provider MCs.
    • Move core components alerts from phoenix to turtles (cluster-autoscaler, vertical-pod-autoscaler, kubelet, etcd-kubernetes-resources-count-exporter, certificates)
    • Split the phoenix job alert into 2:
      • Add the aws specific job alerts in the vintage.aws.management-cluster.rules file.
      • Move the rest of job.rules to turtles because it is provider independent
    • Prefix all vintage alerts with vintage to facilitate maintenance.
    • Merge kiam and inhibit.kiam into one file.
    • Support any AWS WC in the aws-load-balancer-controller alerts.
    • Create a shared IRSA alerts rule file to avoid duplication between capa and vintage aws.
  • Review and fix cabbage alerts for multi-provider MCs and Mimir.
  • Review and fix shield alerts for multi-provider MCs and Mimir.
  • Review and fix honeybadger alerts for multi-provider MCs and Mimir.
  • Review and fix bigmac alerts for multi-provider MCs and Mimir.
    • Fix ManagementClusterDexAppMissing use of absent for mimir.
    • Update team bigmac rules based on the label changes
  • Review and fix atlas alerts for multi-provider MCs and Mimir.
    • Fix alerts using absent metrics for Mimir.
  • Review and fix turtles alerts for multi-provider MCs and Mimir.
    • Fix alerts using absent metrics for Mimir.
    • Reviewed turtles alerts labels.

Fixed

  • Fixed usage of yq, and jq in check-opsrecipes.sh
  • Fetch jq with make install-tools
  • Fixed and improve the check-opsrecipes.sh script to support /_index.md based ops-recipes.
  • Fixed all area alert labels.
  • Fixed cert-exporter alerts to page on all providers.
  • Fixed cilium SLO recording rule, setting a proper threshold for the alert.

Removed

  • cleanup: get rid of microendpoint alerts as it never fired and probably never will
  • cleanup: remove scrape timeout inhibition leftovers (documentation and labels)

v4.1.2

31 May 11:48
80d4281
Compare
Choose a tag to compare

Changed

  • Updated ContainerdVolumeSpaceTooLow, KubeletVolumeSpaceTooLow and LogVolumeSpaceTooLow alerts to not trigger when the node-problem-detector is already remediating the issue.

v4.1.1

30 May 14:39
bddf46d
Compare
Choose a tag to compare

Changed

  • Get rid of the app, role and node external labels in Atlas rules.

v4.1.0

30 May 12:24
6442f5b
Compare
Choose a tag to compare

Added

  • Add aggregation:capi_infrastructure_crd_versions metric to Grafana Cloud.

Fixed

  • Fix remaining pint issues.

Removed

  • Remove api-server from old SLO framework.

v4.0.0

29 May 09:21
adc6718
Compare
Choose a tag to compare

Changed

  • ! Breaking change !

Added

v3.15.0

27 May 11:54
1627ef2
Compare
Choose a tag to compare

Removed

  • Remove atlas old slo alerts in favor of sloth alerts.

Changed

  • pint tests: run automatically on CI. Also, target names have changed.

Fixed

  • Fix node load alerts for CAPI clusters.
  • Remove trailing spaces in rules.
  • Fix KubeStateMetricsNotRetrievingMetrics on mimir.
  • Fix nginx ingress controller opsrecipe link.