-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingress Gateway don't wait for connections to drain and reported http status code 502 error by LoadBalancer #26302
Comments
Can you try setting |
@howardjohn thank you for your reply |
Yes, very likely the same. If you use the code from master you most likely will see no issues |
very nice, i'm looking forward to it, thanks |
Closing as this is fixed by #26524 in 1.8. Unfortunately we cannot backport due to dependencies on Envoy version |
Bug description
We are running Istio ingress gateway in AWS EKS.
We also have AWS ALB in front of ingress gateway and exposed by NodePort to support the Target Group (instance mode).
(the reason why we don't use alb-ingress is because when updating cluster, we can work more safely without it.)
We have noticed that the ALB reports status code 502 error on some requests when Istio ingress gateway's pod is terminating on rolling update, scale in, etc.
so we think the pod went down with the HTTP keep-alive connection remaining between ALB and ingress gateway, resulting in lost packets.
We first tried running sleep using
preStop hook
, increasing theterminationGracePeriodSeconds
, setting theISTIO_META_IDLE_TIMEOUT
, and increasing themeshConfig.drainDurations
, but there was no improvement.We finally found a solution:
wait for connections to drain with the
healthcheck/fail
endpoint.https://www.envoyproxy.io/docs/envoy/latest/operations/admin#operations-admin-interface-healthcheck-fail
after runs this endpoint, envoy responses
Connection: close
for already connections. This is the behavior we've been looking for.https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/operations/draining#arch-overview-draining
operator.yaml
this solution seems to be used by contour too. https://projectcontour.io/docs/master/redeploy-envoy/
With this solution, the 502 error has been successfully eliminated, but are we missing an important somewhere in the documentation?
If you have a more general solution for Istio users like us, I'd like to know about it. thank you.
Affected product area
[ ] Docs
[ ] Installation
[x] Networking
[ ] Performance and Scalability
[ ] Extensions and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
Steps to reproduce the bug
Run istio ingress gateway, stand AWS ALB in front of it, make a request to the application and redeploy ingress gateway(e.g.
kubectl rollout restart deployment istio-ingressgateway
).Version
AWS EKS 1.17
Istio 1.6.2
Installation
generated the yaml file by
istioctl manifest generate -f
from the operator yaml, and kubectl apply -f this.Environment
AWS EKS
The text was updated successfully, but these errors were encountered: