Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd: cancel call when refreshing client #2669

Merged
merged 5 commits into from
Jan 12, 2018

Conversation

BusyJay
Copy link
Member

@BusyJay BusyJay commented Jan 10, 2018

This is a quick fix for possible stale heartbeat streaming call.

Streaming call won't report error when messages are dropped silently in network. However there are sync requests in pd client can detect this error by timeout. When the error is detected, client will be refreshed. At that time, streaming call created by old client should be canceled automatically. But this doesn't happen as expected, so this pr cancels it explicitly.

We need to add a test case to tested this with iptables.

The issue needs to be investaged further too. /cc tikv/grpc-rs#150

@BusyJay BusyJay added type/bug The issue is confirmed as a bug. component/gRPC Component: gRPC labels Jan 10, 2018
@siddontang
Copy link
Contributor

can we use failpoint to reproduce it? E.g, force breaking the loop and take the receiver.

@siddontang siddontang added this to the 2018 Q1 milestone Jan 11, 2018
@overvenus
Copy link
Member

@siddontang
Copy link
Contributor

Do we need to call cancel on raft client?

I think we need. Maybe we should add keep timeout too, but @disksing meets some problems with it.

@BusyJay
Copy link
Member Author

BusyJay commented Jan 11, 2018

I don't think so. If client is refreshed in raft client, then the streaming call is dropped and recreated immediately.

@BusyJay
Copy link
Member Author

BusyJay commented Jan 11, 2018

E.g, force breaking the loop and take the receiver.

I don't get it. What loop to be break? And how and why to take the receiver?

@siddontang
Copy link
Contributor

what I mean is not returning Ready and going here https://github.com/BusyJay/tikv/blob/709096a3ec8e0011b6ac4e434e89f3ca2d3a2278/src/pd/util.rs#L70

Is it OK?

@BusyJay
Copy link
Member Author

BusyJay commented Jan 11, 2018

I don't see any connections between what you mention and the bug trying to fix here. Actually the poll method is never called again until cancel is called.

@siddontang
Copy link
Contributor

Got it.

@BusyJay
Copy link
Member Author

BusyJay commented Jan 11, 2018

PTAL

@siddontang
Copy link
Contributor

LGTM

Copy link
Member

@overvenus overvenus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@BusyJay
Copy link
Member Author

BusyJay commented Jan 12, 2018

/run-integration-tests

@overvenus
Copy link
Member

/rebuild

@BusyJay
Copy link
Member Author

BusyJay commented Jan 12, 2018

/run-integration-tests

@ngaut ngaut merged commit bf9ffda into tikv:master Jan 12, 2018
@BusyJay BusyJay deleted the fix-hb-receiver-fresh branch January 12, 2018 11:05
overvenus pushed a commit to overvenus/tikv that referenced this pull request Jan 15, 2018
* pd: cancel call when refreshing client
overvenus added a commit that referenced this pull request Jan 18, 2018
* Cargo: update prometheus to v0.3.7 (#2684)

* Cargo: update hyper to v0.9.18 (#2686)

* pd: cancel call when refreshing client (#2669)

* ci-build/test.sh: add execute permission (#2472)
sticnarf pushed a commit to sticnarf/tikv that referenced this pull request Oct 27, 2019
* pd: cancel call when refreshing client
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/gRPC Component: gRPC type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants