From 50005d1aec1dc37694fc128a44d3a7e008e4313f Mon Sep 17 00:00:00 2001 From: Olga Bulat Date: Mon, 25 Sep 2023 10:14:56 +0300 Subject: [PATCH] Add runbooks for Nuxt response times alarms (#3012) * Add Nuxt response time alarm runbooks * Update documentation/meta/monitoring/runbooks/nuxt_p99_response_time_above_threshold.md Co-authored-by: Krystle Salazar * Update documentation/meta/monitoring/runbooks/nuxt_avg_response_time_above_threshold.md Co-authored-by: Krystle Salazar --------- Co-authored-by: Krystle Salazar --- .../meta/monitoring/runbooks/index.md | 6 ++-- .../nuxt_avg_response_time_above_threshold.md | 31 +++++++++++++++++++ .../nuxt_p99_response_time_above_threshold.md | 31 +++++++++++++++++++ 3 files changed, 66 insertions(+), 2 deletions(-) create mode 100644 documentation/meta/monitoring/runbooks/nuxt_avg_response_time_above_threshold.md create mode 100644 documentation/meta/monitoring/runbooks/nuxt_p99_response_time_above_threshold.md diff --git a/documentation/meta/monitoring/runbooks/index.md b/documentation/meta/monitoring/runbooks/index.md index 4754b9345eb..3c4745f0a52 100644 --- a/documentation/meta/monitoring/runbooks/index.md +++ b/documentation/meta/monitoring/runbooks/index.md @@ -12,15 +12,17 @@ that can be a good resource when writing a new one. ```{toctree} :titlesonly: +api_request_count_above_threshold api_http_2xx_under_threshold api_http_5xx_above_threshold -api_request_count_above_threshold api_avg_response_time_above_threshold api_avg_response_time_anomaly api_p99_response_time_above_threshold api_p99_response_time_anomaly +nuxt_request_count nuxt_2xx_under_threshold nuxt_5xx_above_threshold -nuxt_request_count +nuxt_avg_response_time_above_threshold +nuxt_p99_response_time_above_threshold unhealthy_ecs_hosts ``` diff --git a/documentation/meta/monitoring/runbooks/nuxt_avg_response_time_above_threshold.md b/documentation/meta/monitoring/runbooks/nuxt_avg_response_time_above_threshold.md new file mode 100644 index 00000000000..c3874021ef0 --- /dev/null +++ b/documentation/meta/monitoring/runbooks/nuxt_avg_response_time_above_threshold.md @@ -0,0 +1,31 @@ +# Run Book: Nuxt Production Average Response Time above threshold + +```{admonition} Metadata +Status: **Unstable** +Maintainer: @obulat +Alarm link: +- +``` + +## Severity Guide + +To identify the source of the slowdown first check if there was a recent +deployment that may have introduced the problem, in that case rollback to the +previous version. Otherwise, check the following, in order: + +1. Request count and general network activity. If abnormally high, refer to the + [traffic analysis run book][traffic_runbook] to identify whether there is + malicious traffic. If not, move on. +2. Check if dependencies like the API or Plausible analytics are constrained. If + stable, move on. + +[traffic_runbook]: + /meta/monitoring/traffic/runbooks/identifying-and-blocking-traffic-anomalies.md + +## Historical false positives + +Nothing registered to date. + +## Related incident reports + +- 2023-06-13 at 03:50 UTC: Frontend increased response times (reason unknown) diff --git a/documentation/meta/monitoring/runbooks/nuxt_p99_response_time_above_threshold.md b/documentation/meta/monitoring/runbooks/nuxt_p99_response_time_above_threshold.md new file mode 100644 index 00000000000..dea1ea3d4ab --- /dev/null +++ b/documentation/meta/monitoring/runbooks/nuxt_p99_response_time_above_threshold.md @@ -0,0 +1,31 @@ +# Run Book: Nuxt Production Average Response Time above threshold + +```{admonition} Metadata +Status: **Unstable** +Maintainer: @obulat +Alarm link: +- +``` + +## Severity Guide + +To identify the source of the slowdown first check if there was a recent +deployment that may have introduced the problem, in that case rollback to the +previous version. Otherwise, check the following, in order: + +1. Request count and general network activity. If abnormally high, refer to the + [traffic analysis run book][traffic_runbook] to identify whether there is + malicious traffic. If not, move on. +2. Check if dependencies like the API or Plausible analytics are constrained. If + stable, move on. + +[traffic_runbook]: + /meta/monitoring/traffic/runbooks/identifying-and-blocking-traffic-anomalies.md + +## Historical false positives + +Nothing registered to date. + +## Related incident reports + +- 2023-06-13 at 03:50 UTC: Frontend increased response times (reason unknown)