-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC streaming keepAlive ping never fails when proxied through Envoy #2086
Comments
I'm not exactly sure what "keep alive" means in your setup, but if it means proxying PING frames, Envoy does not do that currently. |
FYI we are going to add streaming timeouts (basically timeout if no data frames are received in X seconds in either upstream/downstream direction). This would be a potential fix to ^. |
PING frames are only hop-by-hop in HTTP/2 and per-connection, and I don't know if there are any sensible proxying semantics if you are splitting a single client stream across multiple upstream hosts for example. I think Envoy should respond to PING, but not forward. |
I agree. If asked to actually proxy PING, I was going to say no. :) |
Sorry, I assumed you would know more about gRPC keep alive since Envoy normally has such great support for gRPC. In this case it is an option for gRPC streams where you set a time interval (called Time in the gRPC-Go library's KeepAliveParams config struct) and then gRPC will consistently ping the client to make sure that it hasn't disconnected ungracefully. A client that gracefully disconnects will send some sort of DC message to the server. A client that ungracefully disconnects sends nothing, so if this keep alive ping isn't turned on then all ungracefully disconnected clients will have a corresponding hanging connection on the server. It makes sense that Envoy responds to PING frames. Because HTTP/2 PINGs seem a lot different from ICMP pings I was hoping that the normal pings rules didn't apply. It's unfortunate that I can't use the keep alive feature of gRPC. I guess I will have to build my own application level heartbeat. |
Though the problem here is that my service thinks that the request is coming from Envoy, so the PING goes to Envoy, but it is pretty clear that it is intended for the client. Am I getting that wrong? |
@cdelguercio the issue is that the keep-alive mode expects an absence of ping response to mean the other side is gone, and Envoy will always directly respond to ping. In general Envoy would reset the stream, but if there is no FIN and it never tries to write, that won't happen either. I'm pretty sure that #1778 will fix your issue (which I now see you also originally opened!). If that was in place, you could set data layer timeouts, at which point Envoy would shutdown the stream/connection. |
Right, and #1778 would fix it as long as the solution includes a heartbeat of some sort, since in my case the connection can (correctly) be open for an indefinite period of time without sending data and still be valid. |
I saw the http: adding 100-Continue support to Envoy (#2497) PR that got merged. Would it make sense to allow Envoy to be configured to proxy PING frames as a non default option? |
@cdelguercio per the previous discussion, I'm not really sure what it means to proxy ping. Ping is per connection, not per stream. What would the semantics be? |
would it be reasonable for Envoy to reset all of the "upstream streams" corresponding to a dead downstream TCP connection and vice versa? |
@mpuncel Envoy already does this. |
@mattklein123 Using the old closed issue for context... is there a way to use Envoy stream idle timeouts on a route that also handles gRPC streaming connections that may be idle for "a long time" (longer than the stream idle timeout). What I'm looking at (maybe) is to not have a configured stream idle timeout break gRPC streaming connections before grpc-timeout in effect fires? |
@jrajahalme I don't think so currently. Please open a fresh issue for discussion. |
Opened #5142 |
Merge release-1.1 master
Title: gRPC streaming keepAlive ping never fails when proxied through Envoy
Envoy: envoyproxy/envoy-alpine:cd514cc3f1ad82bfd57b6b832b379eb9a2888891
gRPC: grpc-go 1.7.2
Description:
I have a Docker setup where I am running Envoy and a gRPC service running in a single container. Envoy is proxying port 80 to port 8000 where the service is listening. The gRPC has a server->client unidirectional streaming endpoint that has keepAlive enabled so that if a client ever disconnects ungracefully, they won't leave a hanging connection. When I connect to my service directly and Ctrl-Z my test client, in ~30 seconds the server notices that a keepAlive HTTP/2 PING has failed, so it closes the connection. When I connect to my service through Envoy and Ctrl-Z my test client, the connection hangs forever.
I test this locally by running my docker container, and then from my local machine I first point my gRPC test client to port 8000 to bypass Envoy. I get the following results on Wireshark on the docker0 interface:
At the end, there are 3 groups of 3 TCP frames at 55, 85, and 115 seconds on port 8000. These are obviously the keepAlive HTTP/2 PINGs.
Here is what happens when I go through Envoy on port 80:
Here I see the actual HTTP/2, but it's only on the initial connection. No matter how long I listen, I never see any keepAlive frames. I assume my service is still sending the keepAlive PINGs to Envoy on the docker container's loopback interface, but I don't know an easy way to capture that.
gRPC KeepAlive Go config:
Envoy config:
Notice that I have a separate route for my streaming endpoint, because I needed to make the timeout_ms: 0
The text was updated successfully, but these errors were encountered: