Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baseline monitoring changes for Terraform and the unhealthy host count alarm #2499

Closed
sarayourfriend opened this issue Jun 29, 2023 · 1 comment
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🌟 goal: addition Addition of new feature 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: infra Related to the Terraform config and other infrastructure
Milestone

Comments

@sarayourfriend
Copy link
Collaborator

sarayourfriend commented Jun 29, 2023

Description

Project thread: #2344
Implementation plan: https://docs.openverse.org/projects/proposals/monitoring/20230606_implementation_plan_ecs_alarms.html

  1. Create the monitoring modules for frontend and API staging and production and move existing alarms into these
  • Create a new next/modules/monitoring directory with a directory for each service/environment: staging-frontend, production-frontend, staging-api, etc.
  • This includes moving the UptimeRobot configuration for each service as well as the database and Redis monitors
  • Rename service-monitors to service-uptime-robot to clarify the module's purpose
  • Also create the new SNS topic for the unstable alerts' notification channel
  • This does not include moving the ECS service cloudwatch dashboard module, that should remain in the root modules
  1. Create the unhealthy host count alarm for production and staging services

Note

The only thing that will be present in the staging monitoring modules for each service is the unhealthy host count alarm.

Additional context

This issue will remain open until the unhealthy host count alarm is stabilised. However, once the terraform configuration changes and the new monitoring modules are present, all other alarm issues in this milestone will be unblocked.

@sarayourfriend sarayourfriend added 🟧 priority: high Stalls work on the project or its dependents 🌟 goal: addition Addition of new feature 💻 aspect: code Concerns the software code in the repository 🧱 stack: infra Related to the Terraform config and other infrastructure labels Jun 29, 2023
@sarayourfriend sarayourfriend added this to the ECS Alarms milestone Jun 29, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Jun 29, 2023
@zackkrida zackkrida moved this from 📋 Backlog to 📅 To do in Openverse Backlog Jul 5, 2023
@sarayourfriend sarayourfriend moved this from 📅 To do to 🏗 In progress in Openverse Backlog Jul 17, 2023
@krysal
Copy link
Member

krysal commented Jul 28, 2023

Done in WordPress/openverse-infrastructure#558.

@krysal krysal closed this as completed Jul 28, 2023
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in Openverse Backlog Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🌟 goal: addition Addition of new feature 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: infra Related to the Terraform config and other infrastructure
Projects
Archived in project
Development

No branches or pull requests

2 participants