-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove ChannelImpl.ping (or possible mark it experimental) #737
Comments
We know it has users in the short-term. But mark it experimental so we can remove it if its usefulness goes away. |
So, in the long-term, is there expected to be no value in using HTTP/2 ping frames, for testing / keeping alive connections? I think this is generally useful, for both standard client-server comms and for inter-DC server-to-server comms, and would like to see it supported in all three implementations. If it gets dropped, what is the replacement? I'm not a big fan of the health check API since it involves polling. I think a ping can be used to do less frequent and lighter weight polling for connectivity, and then a streaming API could be used so server's push state changes instead of requiring that clients poll them. |
Some thoughts:
|
One other thought: I have actually implemented probing before as a full rpc server endpoint. This is pretty easy to do, light weight, and can be implemented by users with a lot of leeway. |
Many platforms don't give great access to or ability to configure settings for TCP keepalives. For example, in Linux the keepalive frequency is a kernel parameter and it defaults to off unless the socket is opened with SO_KEEPALIVE socket option (and socket options can't be configured with GRPC APIs for creating transports, at least not in the Java implementation). And ICMP packages are often filtered. Furthermore, keepalive and ICMP acks are typically handled in the OS by the network stack, so they may succeed even if the application could not respond (due to overload, throttling, deadlock, etc). So an HTTP/2 ping allows the application to ensure that the connection is tested at a reasonable frequency. I understand that extra surface area in the API poses a maintenance burden, but I feel that it's a very reasonable and useful part of the API (certainly more practical and useful than using TCP keepalives or ICMP). Nothing in the GRPC spec explicitly mentions pings, but GRPC is based on HTTP/2. All of the implementations reply as expected when a ping is received since that is part of the HTTP/2 spec. GRPC is based on HTTP/2, so why not embrace some of the protocol's affordances? About using RPCs to poll the server (instead of HTTP/2 ping frames) and ask it if it is taking traffic, I stated in my initial comment that I'd prefer servers push state changes (via an RPC with a streaming response) instead of requiring clients to poll them. To get good response times (e.g. latency between a server deciding it doesn't want anymore traffic and clients actually respecting that), clients may have to poll for that state frequently, which is wasteful since the state likely changes very infrequently. Letting servers push that information means you can use infrequent pings and still have optimal response times to changes in server health. So the server can tell the client when it is unhealthy. An occasional ping is then used to discover if the server becomes unreachable (or so overloaded that it cannot reply in a timely way). For client-side request routing, that's a signal to not use that particular connection/channel (akin to receiving an "I'm not healthy" message from the server). The client would then periodically attempt to re-connect to the server (with exponential back-off) until it becomes reachable again. In my opinion, this is "right way"™ to do health checking. So I'd really like to see GRPC support it (even if not directly in an opinionated way but in the "yes, it's possible without forking" way). |
This issue was made based on something I had said. It isn't trying to kill ping functionality all together, but it seems there may be a better place for it than exposed on the Channel. I say that because I don't want someone to even have to think about doing pings on a Channel in order to keep it alive. Once we have load-balancing support, I'd much rather Channel or one of the load balancing SPIs do the PINGing. The transports would still support PINGing. @jhump, I'm sorry to alarm you with this. I wasn't going to let it get deleted before discussing with you. I also know there is no alternative API today, so it can't really be removed. I did want to make sure it didn't become part of the permanent API though, since keepalives and health checking implementation is still in its infancy.
TCP keepalives are pretty reasonable for noticing network breakages and keeping the TCP connection from getting dropped by one of the endpoints or a router. But it is also true they don't guarantee that the application is still processing. Yes, we don't expose a way to enable keepalive, but the same could have been said about PING before we (a.k.a., @jhump) added support :).
Not really a replacement, since it won't keep the TCP connection alive. And as was mentioned, it is commonly filtered, doesn't have much of a shared fate, etc. Probably not useful at all to us.
I've been pushing behind the scenes to make sure that PINGs are supported appropriately. PING is a required part of the HTTP/2 spec, so it should be implemented even though things would appear to work today if we didn't. I have high hopes of using PING to measure bandwidth-delay product to enable auto-scaling of the flow control windows. Users shouldn't have to specify the window size in order to get proper throughput from their network; it's 2015.
+1 to infrequent polling + notifications. GOAWAY is a pretty similar notification to me, especially since the health checking request isn't necessarily service-specific. There are some subtleties about hop-by-hop vs end-to-end. TCP keepalive, PING, and GOAWAY are hop-by-hop. A health checking service would be end-to-end. |
based on this discussion, will grpc-java implement a server-push mechanism to push server status to client or it already has done it? |
@makeyang, so, server-push won't be used. There are two parts of the discussion: service health checking and keep alive (aka, TCP health checking). Service health checking will use the gRPC health checking protocol. At some point in the future there will likely be a streaming version of it. The health checking may be used client->server, but there may also be a centralized health checker that coordinates with a load balancer. To deal with networking issues that break TCP connections, we would need TCP keep alive, HTTP/2 pings, or similar, which was what this discussion was about. The details are still a bit up in the air, but there is agreement we need it. We wouldn't use server-push though. |
It isn't clear that ping will be part of the long term gRPC interface (and is easy to add back in later if needed). Either remove it or mark it experimental.
The text was updated successfully, but these errors were encountered: