Skip to content

Commit

Permalink
Add thumb repsonse time runbooks
Browse files Browse the repository at this point in the history
  • Loading branch information
stacimc committed Sep 21, 2023
1 parent 70336cd commit 00214f7
Show file tree
Hide file tree
Showing 4 changed files with 112 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Run Book: API Thumbnails Production Average Response Time above threshold

```{admonition} Metadata
Status: **Unstable**
Maintainer: @stacimc
Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Thumbnails+Production+Average+Response+Time+above+threshold>
```

## Severity Guide

Confirm that there is not a total outage of the service. If not, the severity is
likely low. Check for a recent deployment that may have introduced the problem,
and rollback to the previous version. If not, check the request count and
general network activity. If abnormally high, refer to the [traffic analysis run
book][traffic_runbook] to identify and block any malicious traffic.

[traffic_runbook]:
/meta/monitoring/traffic/runbooks/identifying-and-blocking-traffic-anomalies.md

## Historical false positives

Nothing registered to date.

## Related incident reports

- 2023-09-05 at 22:15 UTC: Unhealthy thumbnail tasks restarted
- 2023-07-27 at 19:14 UTC: API Thumbnails unhealthy hosts
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Run Book: API Thumbnails Production Average Response Time anomalously high

```{admonition} Metadata
Status: **Unstable**
Maintainer: @stacimc
Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Thumbnails+Production+Average+Response+Time+anomalously+high>
```

## Severity Guide

Confirm that there is not a total outage of the service. If not, the severity is
likely low. Check for a recent deployment that may have introduced the problem,
and rollback to the previous version. If not, check the request count and
general network activity. If abnormally high, refer to the [traffic analysis run
book][traffic_runbook] to identify and block any malicious traffic.

[traffic_runbook]:
/meta/monitoring/traffic/runbooks/identifying-and-blocking-traffic-anomalies.md

## Historical false positives

Nothing registered to date.

## Related incident reports

- 2023-09-05 at 22:15 UTC: Unhealthy thumbnail tasks restarted
- 2023-07-27 at 19:14 UTC: API Thumbnails unhealthy hosts
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Run Book: API Thumbnails Production P99 Response Time above threshold

```{admonition} Metadata
Status: **Unstable**
Maintainer: @stacimc
Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Thumbnails+Production+P99+Response+Time+above+threshold>
```

## Severity Guide

Confirm that there is not a total outage of the service. If not, the severity is
likely low. Check for a recent deployment that may have introduced the problem,
and rollback to the previous version. If not, check the request count and
general network activity. If abnormally high, refer to the [traffic analysis run
book][traffic_runbook] to identify and block any malicious traffic.

[traffic_runbook]:
/meta/monitoring/traffic/runbooks/identifying-and-blocking-traffic-anomalies.md

## Historical false positives

Nothing registered to date.

## Related incident reports

- 2023-09-05 at 22:15 UTC: Unhealthy thumbnail tasks restarted
- 2023-07-27 at 19:14 UTC: API Thumbnails unhealthy hosts
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Run Book: API Thumbnails Production P99 Response Time anomalously high

```{admonition} Metadata
Status: **Unstable**
Maintainer: @stacimc
Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Thumbnails+Production+P99+Response+Time+anomalously+high>
```

## Severity Guide

Confirm that there is not a total outage of the service. If not, the severity is
likely low. Check for a recent deployment that may have introduced the problem,
and rollback to the previous version. If not, check the request count and
general network activity. If abnormally high, refer to the [traffic analysis run
book][traffic_runbook] to identify and block any malicious traffic.

[traffic_runbook]:
/meta/monitoring/traffic/runbooks/identifying-and-blocking-traffic-anomalies.md

## Historical false positives

Nothing registered to date.

## Related incident reports

- 2023-09-05 at 22:15 UTC: Unhealthy thumbnail tasks restarted
- 2023-07-27 at 19:14 UTC: API Thumbnails unhealthy hosts

0 comments on commit 00214f7

Please sign in to comment.