-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds a simple healthcheck endpoint for the items service #739
Conversation
6bd3205
to
9a24603
Compare
1cf0fc1
to
dace036
Compare
c019866
to
44d7813
Compare
This change follows #736, and adds an HTTP healthcheck to the items API to ensure the scala service has started before it is registered healthy at the NLB and starts serving requests.
f864ffe
to
102de84
Compare
20fb200
to
3f7c0cb
Compare
Not sure how this works. Does it check /management/healthcheck before hitting /works every time? Then if the healthcheck fails (for whatever reason, could be something other than the instance being currently in deployment) the LB what? tries another instance? |
This change only provides a new endpoint at {
"message": "ok"
} The load balancer (NLB) uses this endpoint to determine if the instance is healthy, and if it is allowed to serve requests. At present the NLB uses a TCP healthcheck that only relies on the nginx sidecar that proxies requests to the app to be available. Nginx comes up very quick while the slowpoke scala app is still yawning and blinking itself awake. This change makes sure the scala app is up by forcing it to serve requests before the load balancer determines it to be healthy. It doesn't do any more sophisticated checks as to whether it can actually serve works, there are some musings about that in slack. |
What does this change?
This change follows #736, and adds an HTTP healthcheck to the items API to ensure the scala service has started before it is registered healthy at the NLB and starts serving requests.
How to test?
Manually check this healthcheck behaves as expected locally running the items APIedit: we've improved the testing locally situation but not resolved it, I feel ok to try this in stage for nowHow can we measure success?
No downtime during deployments resulting in a better experience for visitors to the site, and fewer errors that we cannot effectively respond to in the alerts channel.
Currently we do see only items errors during deployment of the catalogue API, see from #wc-platform-alerts in slack during a deployment following updating the search :
service healthcheck:Have we considered potential risks?
Changing the health-checks changes the failure modes for the API, although we have tested this principle successfully in #736 so the risk is reduced.