Skip to content

Commit

Permalink
Improve Alb alerts (#1061)
Browse files Browse the repository at this point in the history
* Improve `AWS load balancer controller` alert's query.

* Improve `AWS load balancer controller` alert's query.

* Improve `AWS load balancer controller` alert's query.
  • Loading branch information
whites11 authored Mar 11, 2024
1 parent debd22d commit a81bd4c
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 8 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added

- Add new mimir.enabled property to disable the MC/WC split in alerts.
- Add new alert for reconciling errors of `AWS load balancer controller`.

### Changed

- Change ownership of `CadvisorDown` to Turtles/Phoenix.
- Review alerting prior to Mimir migration.
- Increase duration for fluentbit rules to avoid false alerts when a new release is deployed.
- Improve `AWS load balancer controller` alert for failed AWS calls query.

### Removed

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ spec:
groups:
- name: aws-load-balancer-controller
rules:
- alert: AWSLoadBalancerAssumeRoleErrors
- alert: AWSLoadBalancerControllerAWSAPIErrors
annotations:
description: '{{`AWS load balancer pod {{ $labels.namespace}}/{{ $labels.pod_name }} on {{ $labels.cluster_id}}/{{ $labels.cluster }} can not assume the role.`}}'
opsrecipe: alb-role-errors#assume-role-errors
expr: increase(aws_api_calls_total{error_code="WebIdentityErr"}[20m]) > 0
description: '{{`AWS load balancer controller pod {{ $labels.namespace}}/{{ $labels.pod }} on {{ $labels.cluster_id}} is throwing {{ $labels.error_code }} errors when contacting AWS API.`}}'
opsrecipe: alb-errors
expr: sum(increase(aws_api_calls_total{error_code != ""}[20m])) by (error_code,namespace,pod,cluster_id) > 0
for: 40m
labels:
area: managedservices
Expand All @@ -29,11 +29,11 @@ spec:
severity: page
team: phoenix
topic: alb
- alert: AWSLoadBalancerRolePolicyErrors
- alert: AWSLoadBalancerControllerReconcileErrors
annotations:
description: '{{`AWS load balancer pod {{ $labels.namespace}}/{{ $labels.pod_name }} on {{ $labels.cluster_id}}/{{ $labels.cluster }} has a wrong role policy.`}}'
opsrecipe: alb-role-errors#role-policy-errors
expr: increase(aws_api_calls_total{error_code="UnauthorizedOperation"}[20m]) > 0
description: '{{`AWS load balancer controller pod {{ $labels.namespace }}/{{ $labels.pod }} on {{ $labels.cluster_id }} is throwing errors while reconciling the {{ $labels.controller }} controller.`}}'
opsrecipe: alb-errors
expr: sum(increase(controller_runtime_reconcile_total{result = "error"}[20m])) by (controller,namespace,pod,cluster_id) > 0
for: 40m
labels:
area: managedservices
Expand Down

0 comments on commit a81bd4c

Please sign in to comment.