Adds a simple healthcheck endpoint for the items service #739

kenoir · 2024-01-14T15:06:30Z

What does this change?

This change follows #736, and adds an HTTP healthcheck to the items API to ensure the scala service has started before it is registered healthy at the NLB and starts serving requests.

How to test?

~~Manually check this healthcheck behaves as expected locally running the items API~~ edit: we've improved the testing locally situation but not resolved it, I feel ok to try this in stage for now
Deploy this change to stage to ensure the terraform applies the change as expected and tasks register as healthy
Perform a stage deployment while sending requests to understand if we have eliminated the deployment errors.

How can we measure success?

No downtime during deployments resulting in a better experience for visitors to the site, and fewer errors that we cannot effectively respond to in the alerts channel.

Currently we do see only items errors during deployment of the catalogue API, see from #wc-platform-alerts in slack during a deployment following updating the search :

service healthcheck:

Have we considered potential risks?

Changing the health-checks changes the failure modes for the API, although we have tested this principle successfully in #736 so the risk is reduced.

This change follows #736, and adds an HTTP healthcheck to the items API to ensure the scala service has started before it is registered healthy at the NLB and starts serving requests.

agnesgaroux · 2024-01-25T13:59:48Z

Not sure how this works. Does it check /management/healthcheck before hitting /works every time? Then if the healthcheck fails (for whatever reason, could be something other than the instance being currently in deployment) the LB what? tries another instance?

kenoir · 2024-01-25T14:11:51Z

Not sure how this works. Does it check /management/healthcheck before hitting /works every time? Then if the healthcheck fails (for whatever reason, could be something other than the instance being currently in deployment) the LB what? tries another instance?

This change only provides a new endpoint at /management/healthcheck that serves the following json and doesn't do anything else:

{
  "message": "ok"
}

The load balancer (NLB) uses this endpoint to determine if the instance is healthy, and if it is allowed to serve requests. At present the NLB uses a TCP healthcheck that only relies on the nginx sidecar that proxies requests to the app to be available. Nginx comes up very quick while the slowpoke scala app is still yawning and blinking itself awake.

This change makes sure the scala app is up by forcing it to serve requests before the load balancer determines it to be healthy. It doesn't do any more sophisticated checks as to whether it can actually serve works, there are some musings about that in slack.

kenoir requested a review from a team January 14, 2024 15:06

kenoir self-assigned this Jan 14, 2024

kenoir mentioned this pull request Jan 14, 2024

Meaningful ECS service health checks wellcomecollection/wellcomecollection.org#10545

Closed

13 tasks

kenoir linked an issue Jan 14, 2024 that may be closed by this pull request

Meaningful ECS service health checks wellcomecollection/wellcomecollection.org#10545

Closed

13 tasks

kenoir force-pushed the rk/add-healthcheck-endpoint-items branch from 6bd3205 to 9a24603 Compare January 19, 2024 15:04

kenoir force-pushed the rk/add-sbt-dependency-graph branch from 1cf0fc1 to dace036 Compare January 19, 2024 15:09

kenoir force-pushed the rk/add-healthcheck-endpoint-items branch from c019866 to 44d7813 Compare January 23, 2024 11:45

Base automatically changed from rk/add-sbt-dependency-graph to main January 24, 2024 10:09

Adds a simple healthcheck endpoint for the items service

102de84

This change follows #736, and adds an HTTP healthcheck to the items API to ensure the scala service has started before it is registered healthy at the NLB and starts serving requests.

kenoir force-pushed the rk/add-healthcheck-endpoint-items branch from f864ffe to 102de84 Compare January 25, 2024 13:43

enable the http healthcheck for items api

3f7c0cb

kenoir force-pushed the rk/add-healthcheck-endpoint-items branch from 20fb200 to 3f7c0cb Compare January 25, 2024 13:46

kenoir marked this pull request as ready for review January 25, 2024 13:46

Apply auto-formatting rules

1da4ee6

agnesgaroux approved these changes Jan 25, 2024

View reviewed changes

kenoir mentioned this pull request Jan 25, 2024

Adds a simple healthcheck endpoint for the requests service #747

Merged

2 tasks

kenoir merged commit b600b23 into main Jan 25, 2024
1 check passed

kenoir deleted the rk/add-healthcheck-endpoint-items branch January 25, 2024 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds a simple healthcheck endpoint for the items service #739

Adds a simple healthcheck endpoint for the items service #739

kenoir commented Jan 14, 2024 •

edited

Loading

agnesgaroux commented Jan 25, 2024

kenoir commented Jan 25, 2024 •

edited

Loading

Adds a simple healthcheck endpoint for the items service #739

Adds a simple healthcheck endpoint for the items service #739

Conversation

kenoir commented Jan 14, 2024 • edited Loading

What does this change?

How to test?

How can we measure success?

Have we considered potential risks?

agnesgaroux commented Jan 25, 2024

kenoir commented Jan 25, 2024 • edited Loading

kenoir commented Jan 14, 2024 •

edited

Loading

kenoir commented Jan 25, 2024 •

edited

Loading