Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clientv3: add keep-alive to connection #8199

Merged
merged 1 commit into from
Jul 8, 2017

Conversation

sakshamsharma
Copy link
Contributor

this makes the grpc client connection use a keep-alive.

Linked to this issue on Kubernetes

@codecov-io
Copy link

Codecov Report

Merging #8199 into master will decrease coverage by 0.13%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8199      +/-   ##
==========================================
- Coverage    76.4%   76.26%   -0.14%     
==========================================
  Files         344      344              
  Lines       26886    26891       +5     
==========================================
- Hits        20541    20509      -32     
- Misses       4865     4907      +42     
+ Partials     1480     1475       -5
Impacted Files Coverage Δ
clientv3/client.go 82.68% <0%> (-1.49%) ⬇️
clientv3/namespace/watch.go 87.87% <0%> (-12.13%) ⬇️
pkg/adt/interval_tree.go 81.08% <0%> (-6.61%) ⬇️
etcdserver/api/v3election/election.go 64.7% <0%> (-2.95%) ⬇️
clientv3/watch.go 93% <0%> (-2.25%) ⬇️
rafthttp/peer.go 91.72% <0%> (-1.51%) ⬇️
clientv3/balancer.go 95.08% <0%> (-1.1%) ⬇️
clientv3/lease.go 91.57% <0%> (-0.77%) ⬇️
etcdserver/server.go 79.6% <0%> (-0.7%) ⬇️
rafthttp/stream.go 88.25% <0%> (-0.68%) ⬇️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7cf080...c58464e. Read the comment docs.

@xiang90
Copy link
Contributor

xiang90 commented Jul 1, 2017

@sakshamsharma I have not got time to check gRPC doc. Can you help me to understand if this is TPC keepalive or HTTP2 layer keepalive? Can we also enforce keepalive at server side for gRPC connections?

@sakshamsharma
Copy link
Contributor Author

sakshamsharma commented Jul 5, 2017

This is the connection which takes the KeepAlive as a DialOption (here). It seems to be a gRPC connection, I'm not sure about the underlying connection (there is nothing explicit, like there used to be for the older client)

I tried to make it look like the older client's keepalive. It used to have an explicit http connection as a parameter, which was a struct with a KeepAlive member. In this case, ClientConn does support KeepAlive, so I believe that's the place to enforce it. I'm sorry if I missed addressing something, I could not understand what you meant to ask, by your last sentence.

@xiang90
Copy link
Contributor

xiang90 commented Jul 5, 2017

It seems to be a gRPC connection, I'm not sure about the underlying connection (there is nothing explicit, like there used to be for the older client)

We need to figure out what layer is this keepalive at. We need to have clear documentation at etcd client side.

I'm sorry if I missed addressing something, I could not understand what you meant to ask, by your last sentence.

see https://godoc.org/google.golang.org/grpc#KeepaliveParams.

@heyitsanthony
Copy link
Contributor

@xiang90 it's http2 ping frames

@@ -215,6 +216,12 @@ func (c *Client) dialSetupOpts(endpoint string, dopts ...grpc.DialOption) (opts
if c.cfg.DialTimeout > 0 {
opts = []grpc.DialOption{grpc.WithTimeout(c.cfg.DialTimeout)}
}
if c.cfg.DialKeepAlive > 0 {
opts = append(opts, grpc.WithKeepaliveParams(keepalive.ClientParameters{
Time: c.cfg.DialKeepAlive * time.Second,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we set time and timeout the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It defaults to 20 seconds. Doc link

Perhaps I should remove Timeout altogether, since 20 seconds should be enough for a response.

@sakshamsharma sakshamsharma force-pushed the clientv3-keep-alive branch from c58464e to 09c8015 Compare July 5, 2017 20:25
@@ -33,6 +33,10 @@ type Config struct {
// DialTimeout is the timeout for failing to establish a connection.
DialTimeout time.Duration `json:"dial-timeout"`

// DialKeepAlive is the time after which client pings the server to see if transport is
// alive. It waits for this duration for a response before closing the connection.
DialKeepAlive time.Duration `json:"dial-keep-alive"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not keep the naming similar to what grpc has. keepliave usually has two thing:

  • probe internal
  • timeout

users need to know what this time duration is and why we only let them set one of the two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dial-keepalive-probe ?
We can add the timeout too though, if you think its useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just make this similar to gRPC-go naming?

DailKeepAliveTime?

and why not expose the timeout option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Sounds good.

@xiang90
Copy link
Contributor

xiang90 commented Jul 5, 2017

defer to @heyitsanthony. i am a little bit worry about ending up with duplicating every gRPC option in our config structure. but i think we can try to resolve that later.

@@ -215,6 +216,11 @@ func (c *Client) dialSetupOpts(endpoint string, dopts ...grpc.DialOption) (opts
if c.cfg.DialTimeout > 0 {
opts = []grpc.DialOption{grpc.WithTimeout(c.cfg.DialTimeout)}
}
if c.cfg.DialKeepAlive > 0 {
opts = append(opts, grpc.WithKeepaliveParams(keepalive.ClientParameters{
Time: c.cfg.DialKeepAlive * time.Second,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time: c.cfg.DialKeepAlive, since it's already a time.Duration?

Copy link
Contributor Author

@sakshamsharma sakshamsharma Jul 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to read the input in seconds, but send it as time.Duration. time.Duration can be anything (it is nanoseconds in this case). Maybe I should change the type of c.cfg.DialKeepAliveTime to int to make it clear? (and add this to comments)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DialTimeout and AutoSyncInterval are also of type time.Duration but they do not multiply by time.Second-- this is inconsistent; just dropping the multiply should be fine

this makes the grpc client connection use a keep-alive.
@sakshamsharma sakshamsharma force-pushed the clientv3-keep-alive branch from 09c8015 to 2a30a75 Compare July 6, 2017 19:56
@sakshamsharma
Copy link
Contributor Author

@heyitsanthony Fixed that inconsistency.

Copy link
Contributor

@heyitsanthony heyitsanthony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks

@sakshamsharma
Copy link
Contributor Author

Not sure about the CI here, but is that a flaky test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants