Skip to content

Commit

Permalink
Merge branch 'main' into move-shared-component-alerts-to-turtles
Browse files Browse the repository at this point in the history
  • Loading branch information
QuentinBisson authored Jun 9, 2024
2 parents c744975 + da92a86 commit ba82e6f
Show file tree
Hide file tree
Showing 6 changed files with 57 additions and 17 deletions.
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Fixed

- Fixed usage of yq, and jq in check-opsrecipes.sh
- Fetch jq with make install-tools

### Added

- Added a new alerting rule to `falco.rules.yml` to fire an alert for XZ-backdoor.
Expand All @@ -16,6 +21,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Review phoenix alerts towards Mimir.
- Moves ownership of alerts for shared components to turtles.
- Split the phoenix job alert into 2:
- a new file named job.aws.rules that contains the aws specific alerts
- move the rest of job.rules into the shared alerts because it is provider independent
- Move the management cluster certificate alerts into the shared alerts because it is provider independent
- Review and fix phoenix alerts towards Mimir and multi-provider MCs.
- Moves cluster-autoscaler and vpa alerts to turtles.

### Fixed

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
## TODO Remove with vintage
# This rule applies to vintage aws management clusters
{{- if eq .Values.managementCluster.provider.flavor "vintage" }}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
# No need for .Values.mimir.enabled condition - will be gone with Vintage
cluster_type: "management_cluster"
name: aws.job.rules
namespace: {{ .Values.namespace }}
spec:
groups:
- name: aws-jobs
rules:
- alert: JobHasNotBeenScheduledForTooLong
annotations:
description: '{{`CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} has not been scheduled for more than 2 hours.`}}'
opsrecipe: job-has-not-been-scheduled-for-too-long/
expr: (time() - kube_cronjob_status_last_schedule_time{cronjob="route53-manager"}) > 7200
for: 15m
labels:
area: kaas
severity: page
team: phoenix
topic: managementcluster
{{- end }}
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@ spec:
area: kaas
cancel_if_outside_working_hours: "true"
severity: page
team: phoenix
team: {{ include "providerTeam" . }}
topic: security
- alert: ManagementClusterAWSCertificateWillExpireInLessThanOneMonth
- alert: ManagementClusterCertificateWillExpireInLessThanOneMonth
annotations:
description: '{{`Certificate {{ $labels.path }} on {{ $labels.node }} will expire in less than one month.`}}'
opsrecipe: renew-certificates/
expr: (cert_exporter_not_after{cluster_type="management_cluster", provider="aws", path!="/etc/kubernetes/ssl/service-account-crt.pem"} - time()) < 4 * 7 * 24 * 60 * 60
expr: (cert_exporter_not_after{cluster_type="management_cluster", path!="/etc/kubernetes/ssl/service-account-crt.pem"} - time()) < 4 * 7 * 24 * 60 * 60
for: 5m
labels:
area: kaas
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,3 @@ spec:
severity: notify
team: {{ include "providerTeam" . }}
topic: managementcluster
{{- if eq .Values.managementCluster.provider.kind "aws" }}
- alert: JobHasNotBeenScheduledForTooLong
annotations:
description: '{{`CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} has not been scheduled for more than 2 hours.`}}'
opsrecipe: job-has-not-been-scheduled-for-too-long/
expr: (time() - kube_cronjob_status_last_schedule_time{cronjob="route53-manager"}) > 7200
for: 15m
labels:
area: kaas
severity: page
team: phoenix
topic: managementcluster
{{- end }}
6 changes: 5 additions & 1 deletion test/hack/bin/check-opsrecipes.sh
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,10 @@ main() {
local -a E_unexistingrecipe=()
local returncode=0

local -r GIT_WORKDIR="$(git rev-parse --show-toplevel)"
local -r YQ=test/hack/bin/yq
local -r JQ=test/hack/bin/jq

# Investigation section
########################

Expand Down Expand Up @@ -144,7 +148,7 @@ main() {
fi

# parse rules yaml files, and for each rule found output alertname, opsrecipe, and severity, space-separated, on one line.
done < <(yq -o json "$rulesFile" | jq -j '.spec.groups[].rules[] | .alert, " ", .annotations.opsrecipe, " ", .labels.severity, "\n"')
done < <("$GIT_WORKDIR/$YQ" -o json "$rulesFile" | "$GIT_WORKDIR/$JQ" -j '.spec.groups[]?.rules[] | .alert, " ", .annotations.opsrecipe, " ", .labels.severity, "\n"')

checkedRules+=("$rulesFile")
done < <(find $RULES_FILES -type f -print0)
Expand Down
9 changes: 9 additions & 0 deletions test/hack/bin/fetch-tools.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ ARCHITECT_VERSION="6.8.0"
PROMETHEUS_VERSION="2.41.0"
HELM_VERSION="3.9.0"
YQ_VERSION="4.26.1"
JQ_VERSION="1.7.1"
PINT_VERSION="0.58.1"

GIT_WORKDIR=$(git rev-parse --show-toplevel)
Expand All @@ -19,6 +20,8 @@ Linux*)
export ARCHITECT_SOURCE="https://github.com/giantswarm/architect/releases/download/v${ARCHITECT_VERSION}/architect-v${ARCHITECT_VERSION}-linux-amd64.tar.gz"
export YQ_SOURCE="https://github.com/mikefarah/yq/releases/download/v${YQ_VERSION}/yq_linux_amd64.tar.gz"
export YQ_BIN_FILE="yq_linux_amd64"
export JQ_SOURCE="https://github.com/jqlang/jq/releases/download/jq-${JQ_VERSION}/jq-linux-amd64"
export JQ_BIN_FILE="jq"
export PINT_SOURCE="https://github.com/cloudflare/pint/releases/download/v${PINT_VERSION}/pint-${PINT_VERSION}-linux-amd64.tar.gz"
export PINT_BIN_FILE="pint-linux-amd64"
;;
Expand All @@ -29,6 +32,8 @@ Darwin*)
export ARCHITECT_SOURCE="https://github.com/giantswarm/architect/releases/download/v${ARCHITECT_VERSION}/architect-v${ARCHITECT_VERSION}-darwin-amd64.tar.gz"
export YQ_SOURCE="https://github.com/mikefarah/yq/releases/download/v${YQ_VERSION}/yq_darwin_amd64.tar.gz"
export YQ_BIN_FILE="yq_darwin_amd64"
export JQ_SOURCE="https://github.com/jqlang/jq/releases/download/jq-${JQ_VERSION}/jq-macos-amd64"
export JQ_BIN_FILE="jq"
export PINT_SOURCE="https://github.com/cloudflare/pint/releases/download/v${PINT_VERSION}/pint-${PINT_VERSION}-darwin-amd64.tar.gz"
export PINT_BIN_FILE="pint-darwin-amd64"
TAR_CMD="gtar"
Expand Down Expand Up @@ -107,6 +112,10 @@ main() {
"${GIT_WORKDIR}/test/hack/bin/yq-${YQ_VERSION}.tar.gz" \
"$YQ_SOURCE" \
"*/yq_*"
download \
"${JQ_SOURCE}" \
"${GIT_WORKDIR}/test/hack/bin/${JQ_BIN_FILE}"
chmod +x "${GIT_WORKDIR}/test/hack/bin/${JQ_BIN_FILE}"
if [[ ! -f "${GIT_WORKDIR}/test/hack/bin/yq" ]]; then
ln -s "${GIT_WORKDIR}/test/hack/bin/${YQ_BIN_FILE}" "${GIT_WORKDIR}/test/hack/bin/yq"
fi
Expand Down

0 comments on commit ba82e6f

Please sign in to comment.