Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ChannelImpl.ping (or possible mark it experimental) #737

Closed
carl-mastrangelo opened this issue Aug 6, 2015 · 8 comments
Closed
Assignees
Milestone

Comments

@carl-mastrangelo
Copy link
Contributor

It isn't clear that ping will be part of the long term gRPC interface (and is easy to add back in later if needed). Either remove it or mark it experimental.

@carl-mastrangelo carl-mastrangelo added this to the 0.8.0 milestone Aug 6, 2015
@ejona86
Copy link
Member

ejona86 commented Aug 6, 2015

We know it has users in the short-term. But mark it experimental so we can remove it if its usefulness goes away.

@jhump
Copy link
Member

jhump commented Aug 14, 2015

So, in the long-term, is there expected to be no value in using HTTP/2 ping frames, for testing / keeping alive connections? I think this is generally useful, for both standard client-server comms and for inter-DC server-to-server comms, and would like to see it supported in all three implementations.

If it gets dropped, what is the replacement? I'm not a big fan of the health check API since it involves polling. I think a ping can be used to do less frequent and lighter weight polling for connectivity, and then a streaming API could be used so server's push state changes instead of requiring that clients poll them.

@carl-mastrangelo
Copy link
Contributor Author

Some thoughts:

  • Is there value in using a ping frame? I would say there is. As you mention, it's lighter weight than full health checking. A better question to ask is it so valuable to justify having 2 similar methods for checking connectivity? I don't think there is enough additional value over health checking.
  • Ping is kind of in a weird point in the stack. Below it, there are TCP keep alives, which can be used to ensure the connection is up, and below that actual ICMP requests (note that I am not suggesting these are viable alternatives, but they could perform the same duty). Above http2 pings is actual health checking. What does http2 ping do that the others do not?
  • http2 ping is part of the http spec, but I am not sure if it is part of the grpc spec (if you can find a link that clarifies please post it). If someone implements grpc in another library, they may not implement it. What will happen to interoperability if one library implements ping and another doesnt?
  • Finally, I don't know if ping in general will tell you what you want. Often a server could be up, but not want to take additional traffic (shutdown scenarios, for instance) A successful ping would tell you the server is up, but doesn't tell you if you should send additional traffic. If there isn't a course of action to take upon knowing if the server is up, why bother checking?

@carl-mastrangelo
Copy link
Contributor Author

One other thought: I have actually implemented probing before as a full rpc server endpoint. This is pretty easy to do, light weight, and can be implemented by users with a lot of leeway.

@jhump
Copy link
Member

jhump commented Aug 14, 2015

Many platforms don't give great access to or ability to configure settings for TCP keepalives. For example, in Linux the keepalive frequency is a kernel parameter and it defaults to off unless the socket is opened with SO_KEEPALIVE socket option (and socket options can't be configured with GRPC APIs for creating transports, at least not in the Java implementation). And ICMP packages are often filtered. Furthermore, keepalive and ICMP acks are typically handled in the OS by the network stack, so they may succeed even if the application could not respond (due to overload, throttling, deadlock, etc).

So an HTTP/2 ping allows the application to ensure that the connection is tested at a reasonable frequency. I understand that extra surface area in the API poses a maintenance burden, but I feel that it's a very reasonable and useful part of the API (certainly more practical and useful than using TCP keepalives or ICMP).

Nothing in the GRPC spec explicitly mentions pings, but GRPC is based on HTTP/2. All of the implementations reply as expected when a ping is received since that is part of the HTTP/2 spec. GRPC is based on HTTP/2, so why not embrace some of the protocol's affordances?

About using RPCs to poll the server (instead of HTTP/2 ping frames) and ask it if it is taking traffic, I stated in my initial comment that I'd prefer servers push state changes (via an RPC with a streaming response) instead of requiring clients to poll them. To get good response times (e.g. latency between a server deciding it doesn't want anymore traffic and clients actually respecting that), clients may have to poll for that state frequently, which is wasteful since the state likely changes very infrequently. Letting servers push that information means you can use infrequent pings and still have optimal response times to changes in server health.

So the server can tell the client when it is unhealthy. An occasional ping is then used to discover if the server becomes unreachable (or so overloaded that it cannot reply in a timely way). For client-side request routing, that's a signal to not use that particular connection/channel (akin to receiving an "I'm not healthy" message from the server). The client would then periodically attempt to re-connect to the server (with exponential back-off) until it becomes reachable again.

In my opinion, this is "right way"™ to do health checking. So I'd really like to see GRPC support it (even if not directly in an opinionated way but in the "yes, it's possible without forking" way).

@ejona86
Copy link
Member

ejona86 commented Aug 14, 2015

This issue was made based on something I had said. It isn't trying to kill ping functionality all together, but it seems there may be a better place for it than exposed on the Channel. I say that because I don't want someone to even have to think about doing pings on a Channel in order to keep it alive.

Once we have load-balancing support, I'd much rather Channel or one of the load balancing SPIs do the PINGing. The transports would still support PINGing.

@jhump, I'm sorry to alarm you with this. I wasn't going to let it get deleted before discussing with you. I also know there is no alternative API today, so it can't really be removed. I did want to make sure it didn't become part of the permanent API though, since keepalives and health checking implementation is still in its infancy.

SO_KEEPALIVE

TCP keepalives are pretty reasonable for noticing network breakages and keeping the TCP connection from getting dropped by one of the endpoints or a router. But it is also true they don't guarantee that the application is still processing. Yes, we don't expose a way to enable keepalive, but the same could have been said about PING before we (a.k.a., @jhump) added support :).

ICMP

Not really a replacement, since it won't keep the TCP connection alive. And as was mentioned, it is commonly filtered, doesn't have much of a shared fate, etc. Probably not useful at all to us.

Nothing in the GRPC spec explicitly mentions pings, but GRPC is based on HTTP/2. All of the implementations reply as expected when a ping is received since that is part of the HTTP/2 spec.

I've been pushing behind the scenes to make sure that PINGs are supported appropriately. PING is a required part of the HTTP/2 spec, so it should be implemented even though things would appear to work today if we didn't.

I have high hopes of using PING to measure bandwidth-delay product to enable auto-scaling of the flow control windows. Users shouldn't have to specify the window size in order to get proper throughput from their network; it's 2015.

Letting servers push that information means you can use infrequent pings and still have optimal response times to changes in server health.

+1 to infrequent polling + notifications. GOAWAY is a pretty similar notification to me, especially since the health checking request isn't necessarily service-specific.

There are some subtleties about hop-by-hop vs end-to-end. TCP keepalive, PING, and GOAWAY are hop-by-hop. A health checking service would be end-to-end.

@makeyang
Copy link

based on this discussion, will grpc-java implement a server-push mechanism to push server status to client or it already has done it?

@ejona86
Copy link
Member

ejona86 commented Feb 28, 2016

@makeyang, so, server-push won't be used. There are two parts of the discussion: service health checking and keep alive (aka, TCP health checking). Service health checking will use the gRPC health checking protocol. At some point in the future there will likely be a streaming version of it. The health checking may be used client->server, but there may also be a centralized health checker that coordinates with a load balancer.

To deal with networking issues that break TCP connections, we would need TCP keep alive, HTTP/2 pings, or similar, which was what this discussion was about. The details are still a bit up in the air, but there is agreement we need it. We wouldn't use server-push though.

@lock lock bot locked as resolved and limited conversation to collaborators Sep 22, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants