-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only create keepalive timeout if there isn't one pending #2304
Conversation
|
Note, an alternative would be to quash any new ping if one is in-flight (early exit instead of conditional create) - I wasn't sure which is preferred. As the data should be reliable, i'd also expect the first ping to eventually get response. One reason not to do this is it makes the keepalive-load from a client predictable and constant. |
Oh, i'm also willing to rework this to be closer to the spec overall if desired, namely a keepalive should be scheduled relative to the last ack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good change, with just style issue to fix.
@@ -121,7 +121,7 @@ export class Subchannel { | |||
/** | |||
* Timer reference tracking when the most recent ping will be considered lost | |||
*/ | |||
private keepaliveTimeoutId: NodeJS.Timer; | |||
private keepaliveTimeoutId?: NodeJS.Timer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field should be possibly null
instead of optional. See for example session
and channelzSocketRef
in this class.
Update: I opened #2308, which is a big refactor of the Subchannel class that moves all of this code around. I integrated this change into it, so I don't think it makes sense to keep this PR open. If necessary, we can further refine the keepalive behavior after that PR is merged. |
Sounds good to me! |
From https://github.com/grpc/proposal/blob/master/A8-client-side-keepalive.md we have the keepalive
time
andtimeout
; the intention being that a keepalive ping is sent at-least everytime
and if there is no bytes of response withintimeout
the connection is considered dead.The current grpc-js implementation has an issue in that if
time
is less than the actual ping response time (due to network, GC, whatever at the remote end) - a new ping would be scheduled, with a new timeout, and the old reference to the timeout is lost and thus will never be cancelled. This inevitably drops the connection.As the intention of the gRFC [in my reading] was 'any byte' of response, I think it is then reasonable to only have a single timeout active at a time, and despite potentially multiple ping() calls, any response will close that timeout until the next ping.
Unsure on intended testing as I couldn't find an example current test- but from detailed gRPC tracing in our application I believe we are running into this issue, an example log:
It is not clear which keepalive is responsible for that timeout, however the connection did get a ping response less than a second prior so I don't think it should be considered a timeout (our settings are 1000ms/10000ms for time and timeout)