Custom `path` for health check (eg. `/healthz`) #2938

Maxwell2022 · 2023-04-13T05:43:56Z

Is your feature request related to a problem? Please describe.
I want to be able to set a custom path for the health check endpoint

Describe the solution you'd like

health_check:
  listen: 0.0.0.0:4000
  enabled: true
  path: /healthz

Describe alternatives you've considered
Update our Kubernetes configuration

Additional context
I'm migrating from @apollo/gateway to Apollo router. We should be able to deploy our new gateway (router) without changing the configuration of its corresponding service in kubernetes. We adopted a GCP standard for all our services, looking up the /healthz endpoint. Unfortunately Apollo router is hardcoding this endpoint to /health and therefore we cannot deploy with zero downtime.

The text was updated successfully, but these errors were encountered:

abernix · 2023-04-13T09:11:48Z

Unfortunately Apollo router is hardcoding this endpoint to /health and therefore we cannot deploy with zero downtime.

I'm not sure I understand why there would be an inability to deploy with zero downtime on account of the health check endpoint path. Could you elaborate on that?

Maxwell2022 · 2023-04-13T21:09:21Z

Our pipeline is deploying services by updating the Kubernetes deployment configuration for the service we are deploying, updating the image version to use for the service (that's standard way to deploy Kubernetes services).

If we deploy this service with a new health check path, it will fail the deployment and revert back to previous version (because new pods will be considered not healthy).

If we update our Kubernetes configuration before we deploy the service, then it would fail the health check for our current gateway (which listen on /healthz).

We would have to deploy and update the Kubernetes configuration at the same time, and even this will be a problem because the old pod would start failing the health check while the new pods comes up. For reference it takes around ~1 to 2 minutes for AWS EKS to register a pod. I cannot see this happening without downtime.

Ideally we can setup this health check path so that there is no breaking change migrating from @apollo/gateway to Apollo Router

garypen · 2023-05-03T14:51:08Z

Hi @Maxwell2022. I wanted to be as close as I could be to 100% certain that changing liveness/readiness configuration didn't behave as you described, so I setup an experiment in a kubernetes cluster.

I created a custom router image with a health endpoint of /healthz. I then deployed this into my test cluster and observed it working as expected. Both liveness and readiness checks running against this custom router.

I then modified the deployment to use a standard router image (1.13.1, but any recent version would be fine) with a health endpoint of /health and updated the liveness and readiness checks from /healthz to /health. Here's what happened to the pods which I watched with the -w flag:

router-5c78545b7b-p545s   1/1     Running   0          85s
router-bf6ddccd8-vz2lb    0/1     Pending   0          0s
router-bf6ddccd8-vz2lb    0/1     Pending   0          0s
router-bf6ddccd8-vz2lb    0/1     ContainerCreating   0          0s
router-bf6ddccd8-vz2lb    0/1     Running             0          8s
router-bf6ddccd8-vz2lb    1/1     Running             0          8s
router-5c78545b7b-p545s   1/1     Terminating         0          6m23s
router-5c78545b7b-p545s   0/1     Terminating         0          6m24s
router-5c78545b7b-p545s   0/1     Terminating         0          6m24s
router-5c78545b7b-p545s   0/1     Terminating         0          6m24s

You'll notice the the new (/health) pod (vz2lb) has to start and transition to running before the old (/healthz) pod (p545s) is shut down. Whilst this transition was ongoing I had another terminal making queries against the router service and there was no interruption in service.

This is because with a Deployment with a default update strategy of RollingUpdate, old pods continue to execute with existing configuration until replacement pods (with the new configuration) transition to running. A good description of how this works.

Summary: You should be able to update your kubernetes configuration with confidence that it will work with no interruption in service, unless there is some other reason preventing this.

Note: I'm assuming your deployment uses the default RollingUpdate strategy and not Recreate. You don't mention that and it isn't widely used, but that would lead to interruption in service on changing any aspect of your deployment configuration.

Maxwell2022 · 2023-05-03T22:37:15Z

Hey @garypen, thanks for your message.

I think there are a few more things here to consider. The first one is the watch flag you share here is when deploying the new container, but what happens if you change the readiness probe from /healthz to /health without deploying anything just yet? I'm pretty sure all pods will be flagged as "unhealthy" and the service will not respond.

In your example, it takes 8 seconds to pull and start the new container, that's to me is unrealistic, we are closer to 120 seconds in AWS EKS using Fargate (we are using RollingUpdate too).

Now, I'm unsure what the complexity is or why there is a pushback to have this health check path configurable. I know that in our case it would help to have consistency in our services and how we deploy / smoke tests our services in CI. At the moment we have a bunch of conditions just for Apollo Router, it's not ideal.

That said if there are good reasons then I'm happy to close this issue.

garypen · 2023-05-04T07:43:13Z

Hi @Maxwell2022

There is no pushback to allow the health check to be configurable, I'm just trying to make clear for the benefit of anyone following this issue that there is no technical requirement for this to exist for a kubernetes deployment of the router replacing an existing gateway deployment.

Regarding changing the readiness probe without changing the container (i.e. the implementation). That is clearly a mis-configuration and won't work. It won't affect any of your existing deployed pods (assuming update strategy is RollingUpdate). That's not specific to a readiness check, that's just mis-configuring a deployment.

Regarding the amount of time to create new pods. That doesn't affect the update, so that's also not a factor. It will just take longer for the update to occur, but the existing pods (with existing configuration) will continue to operate until the new pods (with new configuration) become ready.

We'd be more than happy to take a PR which allows for configuring the end-point of the health check. We'd wish the current (fixed) endpoint to remain the default and the implementation would need to ensure that there is no clash with other configurable endpoints.

Maxwell2022 · 2023-05-04T07:52:02Z

I'm just trying to make clear for the benefit of anyone following this issue that there is no technical requirement for this to exist for a kubernetes deployment of the router replacing an existing gateway deployment.

True 💯, this was just our concern at the time. Thanks for clarifying this

We'd be more than happy to take a PR which allows for configuring the end-point of the health check

Might be an opportunity to jump on the crablang's bandwagon. I cannot promise anything tho.

Thanks again Gary

resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default

Maxwell2022 added the raised by user label Apr 13, 2023

aaronArinder pushed a commit to aaronArinder/router that referenced this issue Nov 4, 2023

Configuration: allow custom health check endpoint

d6d86f3

resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default

aaronArinder pushed a commit to aaronArinder/router that referenced this issue Nov 4, 2023

Configuration: allow custom health check endpoint

4d40c28

resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default

aaronArinder mentioned this issue Nov 4, 2023

Configuration: allow custom health check endpoint #4145

Merged

6 tasks

aaronArinder pushed a commit to aaronArinder/router that referenced this issue Nov 4, 2023

Configuration: allow custom health check endpoint

4d3e705

resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default

aaronArinder pushed a commit to aaronArinder/router that referenced this issue Nov 4, 2023

Configuration: allow custom health check endpoint

9d2182e

resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default

aaronArinder pushed a commit to aaronArinder/router that referenced this issue Nov 4, 2023

Configuration: allow custom health check endpoint

2e3cd9f

resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default

aaronArinder pushed a commit to aaronArinder/router that referenced this issue Nov 6, 2023

Configuration: allow custom health check endpoint

2899e04

resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default

aaronArinder pushed a commit to aaronArinder/router that referenced this issue Nov 6, 2023

Configuration: allow custom health check endpoint

0584275

resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default

aaronArinder pushed a commit to aaronArinder/router that referenced this issue Nov 6, 2023

Configuration: allow custom health check endpoint

0da664e

resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default

aaronArinder pushed a commit to aaronArinder/router that referenced this issue Nov 6, 2023

Configuration: allow custom health check endpoint

4ff6c68

resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default

bnjjj closed this as completed in #4145 Nov 10, 2023

abernix mentioned this issue Nov 21, 2023

prep release: v1.34.1 #4245

Merged

theJC mentioned this issue Jul 13, 2024

Helm chart: Support custom path for health check #5652

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom `path` for health check (eg. `/healthz`) #2938

Custom `path` for health check (eg. `/healthz`) #2938

Maxwell2022 commented Apr 13, 2023

abernix commented Apr 13, 2023 •

edited

Loading

Maxwell2022 commented Apr 13, 2023 •

edited

Loading

garypen commented May 3, 2023

Maxwell2022 commented May 3, 2023

garypen commented May 4, 2023

Maxwell2022 commented May 4, 2023

Custom path for health check (eg. /healthz) #2938

Custom path for health check (eg. /healthz) #2938

Comments

Maxwell2022 commented Apr 13, 2023

abernix commented Apr 13, 2023 • edited Loading

Maxwell2022 commented Apr 13, 2023 • edited Loading

garypen commented May 3, 2023

Maxwell2022 commented May 3, 2023

garypen commented May 4, 2023

Maxwell2022 commented May 4, 2023

Custom `path` for health check (eg. `/healthz`) #2938

Custom `path` for health check (eg. `/healthz`) #2938

abernix commented Apr 13, 2023 •

edited

Loading

Maxwell2022 commented Apr 13, 2023 •

edited

Loading