-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clientv3: add keep-alive to connection #8199
Conversation
2e0ef6f
to
c58464e
Compare
Codecov Report
@@ Coverage Diff @@
## master #8199 +/- ##
==========================================
- Coverage 76.4% 76.26% -0.14%
==========================================
Files 344 344
Lines 26886 26891 +5
==========================================
- Hits 20541 20509 -32
- Misses 4865 4907 +42
+ Partials 1480 1475 -5
Continue to review full report at Codecov.
|
@sakshamsharma I have not got time to check gRPC doc. Can you help me to understand if this is TPC keepalive or HTTP2 layer keepalive? Can we also enforce keepalive at server side for gRPC connections? |
This is the connection which takes the KeepAlive as a DialOption (here). It seems to be a gRPC connection, I'm not sure about the underlying connection (there is nothing explicit, like there used to be for the older client) I tried to make it look like the older client's keepalive. It used to have an explicit http connection as a parameter, which was a struct with a KeepAlive member. In this case, ClientConn does support KeepAlive, so I believe that's the place to enforce it. I'm sorry if I missed addressing something, I could not understand what you meant to ask, by your last sentence. |
We need to figure out what layer is this keepalive at. We need to have clear documentation at etcd client side.
see https://godoc.org/google.golang.org/grpc#KeepaliveParams. |
@xiang90 it's http2 ping frames |
clientv3/client.go
Outdated
@@ -215,6 +216,12 @@ func (c *Client) dialSetupOpts(endpoint string, dopts ...grpc.DialOption) (opts | |||
if c.cfg.DialTimeout > 0 { | |||
opts = []grpc.DialOption{grpc.WithTimeout(c.cfg.DialTimeout)} | |||
} | |||
if c.cfg.DialKeepAlive > 0 { | |||
opts = append(opts, grpc.WithKeepaliveParams(keepalive.ClientParameters{ | |||
Time: c.cfg.DialKeepAlive * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we set time and timeout the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It defaults to 20 seconds. Doc link
Perhaps I should remove Timeout
altogether, since 20 seconds should be enough for a response.
c58464e
to
09c8015
Compare
clientv3/config.go
Outdated
@@ -33,6 +33,10 @@ type Config struct { | |||
// DialTimeout is the timeout for failing to establish a connection. | |||
DialTimeout time.Duration `json:"dial-timeout"` | |||
|
|||
// DialKeepAlive is the time after which client pings the server to see if transport is | |||
// alive. It waits for this duration for a response before closing the connection. | |||
DialKeepAlive time.Duration `json:"dial-keep-alive"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not keep the naming similar to what grpc has. keepliave usually has two thing:
- probe internal
- timeout
users need to know what this time duration is and why we only let them set one of the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dial-keepalive-probe
?
We can add the timeout too though, if you think its useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just make this similar to gRPC-go naming?
DailKeepAliveTime?
and why not expose the timeout option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Sounds good.
defer to @heyitsanthony. i am a little bit worry about ending up with duplicating every gRPC option in our config structure. but i think we can try to resolve that later. |
clientv3/client.go
Outdated
@@ -215,6 +216,11 @@ func (c *Client) dialSetupOpts(endpoint string, dopts ...grpc.DialOption) (opts | |||
if c.cfg.DialTimeout > 0 { | |||
opts = []grpc.DialOption{grpc.WithTimeout(c.cfg.DialTimeout)} | |||
} | |||
if c.cfg.DialKeepAlive > 0 { | |||
opts = append(opts, grpc.WithKeepaliveParams(keepalive.ClientParameters{ | |||
Time: c.cfg.DialKeepAlive * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time: c.cfg.DialKeepAlive,
since it's already a time.Duration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to read the input in seconds, but send it as time.Duration. time.Duration can be anything (it is nanoseconds in this case). Maybe I should change the type of c.cfg.DialKeepAliveTime
to int to make it clear? (and add this to comments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DialTimeout
and AutoSyncInterval
are also of type time.Duration
but they do not multiply by time.Second
-- this is inconsistent; just dropping the multiply should be fine
this makes the grpc client connection use a keep-alive.
09c8015
to
2a30a75
Compare
@heyitsanthony Fixed that inconsistency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks
Not sure about the CI here, but is that a flaky test? |
this makes the grpc client connection use a keep-alive.
Linked to this issue on Kubernetes