Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meaningful ECS service health checks #10545

Assignees

Comments

@jamieparkinson
Copy link
Contributor

jamieparkinson commented Jan 4, 2024

Some of our web application / API services expose health check endpoints, but none of these actually perform a meaningful check for service health.

There are 2 primary consequences of this:

  • ECS doesn't know that it needs to restart applications that have got into a broken state (we have had to do this manually sometimes)
  • During deployments we see a small number of failed requests due to the service not being able to serve requests at the time it receives them (see Fix catalogue-api request errors during deployment #10513)

Here are the services which need health checks adding/improving:

@jamieparkinson jamieparkinson converted this from a draft issue Jan 4, 2024
@jamieparkinson jamieparkinson self-assigned this Jan 4, 2024
@kenoir kenoir self-assigned this Jan 8, 2024
@kenoir kenoir moved this from In progress to Next in Digital platform Jan 11, 2024
@kenoir kenoir moved this from Next to In progress in Digital platform Jan 11, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in Digital platform Jan 12, 2024
@kenoir kenoir reopened this Jan 14, 2024
@kenoir kenoir moved this from Done to In progress in Digital platform Jan 14, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in Digital platform Jan 19, 2024
@kenoir kenoir reopened this Jan 19, 2024
@kenoir kenoir closed this as completed Jan 19, 2024
@kenoir kenoir reopened this Jan 19, 2024
@kenoir kenoir moved this from Done to Ready for review in Digital platform Jan 19, 2024
@kenoir kenoir moved this from Ready for review to In progress in Digital platform Jan 19, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in Digital platform Jan 19, 2024
@kenoir
Copy link
Contributor

kenoir commented Jan 19, 2024

Flagging that for wellcomecollection.org the healthcheck endpoints are HTTP but still terminate at the nginx container: https://github.com/wellcomecollection/platform-infrastructure/blob/main/images/dockerfiles/nginx/frontend.nginx.conf#L24

@kenoir kenoir reopened this Jan 22, 2024
@kenoir kenoir moved this from Done to In progress in Digital platform Jan 22, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in Digital platform Jan 23, 2024
@kenoir kenoir reopened this Jan 25, 2024
@kenoir kenoir moved this from Done to In progress in Digital platform Jan 26, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in Digital platform Jan 26, 2024
@kenoir kenoir reopened this Jan 26, 2024
@kenoir kenoir moved this from Done to In progress in Digital platform Jan 26, 2024
@kenoir kenoir closed this as completed Jan 26, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in Digital platform Jan 26, 2024
@kenoir kenoir linked a pull request Jan 26, 2024 that will close this issue
2 tasks
@kenoir kenoir reopened this Jan 26, 2024
@kenoir kenoir moved this from Done to In progress in Digital platform Jan 26, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in Digital platform Jan 26, 2024
@kenoir kenoir reopened this Jan 26, 2024
@kenoir kenoir moved this from Done to In progress in Digital platform Jan 26, 2024
@kenoir kenoir moved this from In progress to Ready for review in Digital platform Feb 6, 2024
@github-project-automation github-project-automation bot moved this from Ready for review to Done in Digital platform Feb 6, 2024
@pollecuttn pollecuttn moved this from Done to Archive in Digital platform Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment