Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: supports to add gRPC dial options #2035

Merged
merged 7 commits into from
Dec 19, 2019
Merged

Conversation

nolouch
Copy link
Contributor

@nolouch nolouch commented Dec 18, 2019

Signed-off-by: nolouch [email protected]

What problem does this PR solve?

we test
After all 3 instances of PD are killed in AWS(k8s environment), it takes a long time (15 minutes) for TiDB server instances to reconnect to new PD instances. and we found the stale TCP connection after all pod IP is changed.

 we see the connection is still establish, and that ip does not exist after kill.
10.0.48.116 is not exist, but the tcp is establisted.

Tue Dec 17 12:54:36 UTC 2019
tcp        0      1 10.0.38.181:53424       172.20.62.78:2379       SYN_SENT    1/tidb-server
tcp        0      0 10.0.38.181:53302       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53304       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53300       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:37554       10.0.46.225:2379        ESTABLISHED 1/tidb-server
tcp        0   1140 10.0.38.181:35378       10.0.48.116:2379        ESTABLISHED 1/tidb-server
Tue Dec 17 12:54:37 UTC 2019
tcp        0      1 10.0.38.181:53424       172.20.62.78:2379       SYN_SENT    1/tidb-server
tcp        0      0 10.0.38.181:53302       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53304       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53300       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:37554       10.0.46.225:2379        ESTABLISHED 1/tidb-server
tcp        0   1153 10.0.38.181:35378       10.0.48.116:2379        ESTABLISHED 1/tidb-server
Tue Dec 17 12:54:37 UTC 2019
tcp        0      0 10.0.38.181:53302       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53304       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53488       172.20.62.78:2379       ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53300       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:37554       10.0.46.225:2379        ESTABLISHED 1/tidb-server
tcp        0   1153 10.0.38.181:35378       10.0.48.116:2379        ESTABLISHED 1/tidb-server

....

Tue Dec 17 13:07:15 UTC 2019
tcp        0      0 10.0.38.181:53302       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53304       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53488       172.20.62.78:2379       ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53300       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:37554       10.0.46.225:2379        ESTABLISHED 1/tidb-server
tcp        0  26868 10.0.38.181:35378       10.0.48.116:2379        ESTABLISHED 1/tidb-server
Tue Dec 17 13:07:16 UTC 2019
tcp        0      0 10.0.38.181:53302       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53304       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53488       172.20.62.78:2379       ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53300       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:37554       10.0.46.225:2379        ESTABLISHED 1/tidb-server
tcp        0  26868 10.0.38.181:35378       10.0.48.116:2379        ESTABLISHED 1/tidb-server
Tue Dec 17 13:07:16 UTC 2019
tcp        0      0 10.0.38.181:53302       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53304       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53488       172.20.62.78:2379       ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53300       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:37554       10.0.46.225:2379        ESTABLISHED 1/tidb-server
tcp        0  26868 10.0.38.181:35378       10.0.48.116:2379        ESTABLISHED 1/tidb-server
Tue Dec 17 13:07:17 UTC 2019
tcp        0      0 10.0.38.181:53302       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53304       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53488       172.20.62.78:2379       ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:53300       10.0.22.167:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:37554       10.0.46.225:2379        ESTABLISHED 1/tidb-server
tcp        0      0 10.0.38.181:41754       10.0.55.94:2379         ESTABLISHED 1/tidb-server

This problem same as pingcap/tidb#7099. may k8s CNI dropping all packets send to the removed node(Indeterminate), that cause a stall conneciton, until kernel TCP retransmission times out and closes the connection.

What is changed and how it works?

supports add gRPC dial options and setting gRPC KeepAlive in tidb's pd client, this problem fixed.

Check List

Tests

  • Unit test

@nolouch nolouch requested review from disksing and rleungx December 18, 2019 07:45
@nolouch nolouch added the component/client Client logic. label Dec 18, 2019
@nolouch nolouch requested a review from shafreeck December 18, 2019 08:08
@codecov-io
Copy link

codecov-io commented Dec 18, 2019

Codecov Report

Merging #2035 into master will decrease coverage by 0.13%.
The diff coverage is 77.77%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2035      +/-   ##
==========================================
- Coverage   76.94%   76.81%   -0.14%     
==========================================
  Files         183      183              
  Lines       18335    18335              
==========================================
- Hits        14108    14084      -24     
- Misses       3163     3181      +18     
- Partials     1064     1070       +6
Impacted Files Coverage Δ
server/region_syncer/client.go 82.27% <77.77%> (+2.53%) ⬆️
server/schedulers/shuffle_hot_region.go 62.62% <0%> (-5.06%) ⬇️
server/member/leader.go 76.53% <0%> (-4.6%) ⬇️
server/kv/etcd_kv.go 81.81% <0%> (-2.6%) ⬇️
client/client.go 72.23% <0%> (-1.82%) ⬇️
server/core/storage.go 74.25% <0%> (-1.2%) ⬇️
server/config/option.go 90.32% <0%> (-0.93%) ⬇️
pkg/btree/btree.go 86.84% <0%> (-0.81%) ⬇️
server/handler.go 48.91% <0%> (-0.44%) ⬇️
server/server.go 82.65% <0%> (-0.39%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2074dbd...5c609c1. Read the comment docs.

Signed-off-by: nolouch <[email protected]>
client/client.go Outdated Show resolved Hide resolved
Copy link
Member

@rleungx rleungx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@lhy1024 lhy1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest LGTM

client/client.go Outdated Show resolved Hide resolved
// ClientOption configures client.
type ClientOption func(c *client)

// WithGRPCDialOptions configures the client with gRPC dial options.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, we have no gRPC related parameter exposed in Client, to keep the interface implementation-independent, how about defining a new Option type and map it to grpc.DialOption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's means if I want to add a new gRPC option, I need to map it again and then update pd in tidb again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: nolouch <[email protected]>
Signed-off-by: nolouch <[email protected]>
@nolouch nolouch added needs-cherry-pick-release-3.0 The PR needs to cherry pick to release-3.0 branch. needs-cherry-pick-release-3.1 The PR needs to cherry pick to release-3.1 branch. labels Dec 19, 2019
@nolouch nolouch added this to the v4.0.0-beta milestone Dec 19, 2019
@nolouch
Copy link
Contributor Author

nolouch commented Dec 19, 2019

PTAL @disksing @lhy1024

@nolouch
Copy link
Contributor Author

nolouch commented Dec 19, 2019

/merge

@sre-bot sre-bot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 19, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Dec 19, 2019

Your auto merge job has been accepted, waiting for 2040

@sre-bot
Copy link
Contributor

sre-bot commented Dec 19, 2019

/run-all-tests

@sre-bot sre-bot merged commit 4d65bbe into tikv:master Dec 19, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Dec 19, 2019

cherry pick to release-3.0 failed

@sre-bot
Copy link
Contributor

sre-bot commented Dec 19, 2019

cherry pick to release-3.1 failed

nolouch added a commit to nolouch/pd that referenced this pull request Dec 23, 2019
sre-bot pushed a commit that referenced this pull request Dec 23, 2019
sre-bot pushed a commit to sre-bot/pd that referenced this pull request Dec 23, 2019
sre-bot added a commit that referenced this pull request Dec 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/client Client logic. needs-cherry-pick-release-3.0 The PR needs to cherry pick to release-3.0 branch. needs-cherry-pick-release-3.1 The PR needs to cherry pick to release-3.1 branch. status/can-merge Indicates a PR has been approved by a committer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants