-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom path
for health check (eg. /healthz
)
#2938
Comments
I'm not sure I understand why there would be an inability to deploy with zero downtime on account of the health check endpoint path. Could you elaborate on that? |
Our pipeline is deploying services by updating the Kubernetes deployment configuration for the service we are deploying, updating the image version to use for the service (that's standard way to deploy Kubernetes services). If we deploy this service with a new health check path, it will fail the deployment and revert back to previous version (because new pods will be considered not healthy). If we update our Kubernetes configuration before we deploy the service, then it would fail the health check for our current gateway (which listen on We would have to deploy and update the Kubernetes configuration at the same time, and even this will be a problem because the old pod would start failing the health check while the new pods comes up. For reference it takes around ~1 to 2 minutes for AWS EKS to register a pod. I cannot see this happening without downtime. Ideally we can setup this health check path so that there is no breaking change migrating from |
Hi @Maxwell2022. I wanted to be as close as I could be to 100% certain that changing liveness/readiness configuration didn't behave as you described, so I setup an experiment in a kubernetes cluster. I created a custom router image with a health endpoint of I then modified the deployment to use a standard router image (
You'll notice the the new ( This is because with a Deployment with a default update strategy of Summary: You should be able to update your kubernetes configuration with confidence that it will work with no interruption in service, unless there is some other reason preventing this. Note: I'm assuming your deployment uses the default |
Hey @garypen, thanks for your message. I think there are a few more things here to consider. The first one is the watch flag you share here is when deploying the new container, but what happens if you change the readiness probe from In your example, it takes 8 seconds to pull and start the new container, that's to me is unrealistic, we are closer to 120 seconds in AWS EKS using Fargate (we are using RollingUpdate too). Now, I'm unsure what the complexity is or why there is a pushback to have this health check path configurable. I know that in our case it would help to have consistency in our services and how we deploy / smoke tests our services in CI. At the moment we have a bunch of conditions just for Apollo Router, it's not ideal. That said if there are good reasons then I'm happy to close this issue. |
Hi @Maxwell2022 There is no pushback to allow the health check to be configurable, I'm just trying to make clear for the benefit of anyone following this issue that there is no technical requirement for this to exist for a kubernetes deployment of the router replacing an existing gateway deployment. Regarding changing the readiness probe without changing the container (i.e. the implementation). That is clearly a mis-configuration and won't work. It won't affect any of your existing deployed pods (assuming update strategy is RollingUpdate). That's not specific to a readiness check, that's just mis-configuring a deployment. Regarding the amount of time to create new pods. That doesn't affect the update, so that's also not a factor. It will just take longer for the update to occur, but the existing pods (with existing configuration) will continue to operate until the new pods (with new configuration) become ready. We'd be more than happy to take a PR which allows for configuring the end-point of the health check. We'd wish the current (fixed) endpoint to remain the default and the implementation would need to ensure that there is no clash with other configurable endpoints. |
True 💯, this was just our concern at the time. Thanks for clarifying this
Might be an opportunity to jump on the crablang's bandwagon. I cannot promise anything tho. Thanks again Gary |
resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default
resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default
resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default
resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default
resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default
resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default
resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default
resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default
resolves apollographql#2938 Adds a configuration option for custom health check endpoints, using `/health` as default
Is your feature request related to a problem? Please describe.
I want to be able to set a custom path for the health check endpoint
Describe the solution you'd like
Describe alternatives you've considered
Update our Kubernetes configuration
Additional context
I'm migrating from @apollo/gateway to Apollo router. We should be able to deploy our new gateway (router) without changing the configuration of its corresponding service in kubernetes. We adopted a GCP standard for all our services, looking up the
/healthz
endpoint. Unfortunately Apollo router is hardcoding this endpoint to/health
and therefore we cannot deploy with zero downtime.The text was updated successfully, but these errors were encountered: