-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdutil, leadership: use RequestProgress in watch loop #6961
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Skipping CI for Draft Pull Request. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #6961 +/- ##
==========================================
- Coverage 74.20% 74.16% -0.05%
==========================================
Files 433 433
Lines 46083 46133 +50
==========================================
+ Hits 34196 34213 +17
- Misses 8868 8893 +25
- Partials 3019 3027 +8
Flags with carried forward coverage won't be shown. Click here to find out more. |
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Is this PR reviewable? |
No, it still be need to more test |
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
/check-issue-triage-complete |
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
pkg/election/leadership.go
Outdated
@@ -256,15 +260,15 @@ func (ls *Leadership) Watch(serverCtx context.Context, revision int64) { | |||
// When etcd is not available, the watcher.RequestProgress will block, | |||
// so we check the etcd availability first. | |||
if !etcdutil.IsHealthy(serverCtx, ls.client) { | |||
log.Error("the connection of the leadership watcher is unhealthy", | |||
log.Warn("the connection of the leadership watcher is unhealthy, retry to watch later", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to make the log different.
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Each plan run 10 network-partition and 10 pd-leader-io-hang with 03e95db |
opts := append(lw.opts, clientv3.WithRev(revision)) | ||
watchChan := watcher.Watch(watchChanCtx, lw.key, opts...) | ||
WatchChanLoop: | ||
watcherCtx, cancel := context.WithCancel(clientv3.WithRequireLeader(ctx)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
watcherCtx, cancel := context.WithCancel(clientv3.WithRequireLeader(ctx)) | |
watcherCtx, watchChanCancel = context.WithCancel(clientv3.WithRequireLeader(ctx)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/utils/etcdutil/etcdutil.go
Outdated
@@ -840,3 +894,18 @@ func (lw *LoopWatcher) SetLoadTimeout(timeout time.Duration) { | |||
func (lw *LoopWatcher) SetLoadBatchSize(size int64) { | |||
lw.loadBatchSize = size | |||
} | |||
|
|||
// CheckWatchChan checks whether the watch channel is blocked for a long time while creating a new watch channel. | |||
func CheckWatchChan(ctx context.Context, cancel context.CancelFunc, done chan struct{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between checkStream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no difference? only avoid cycle import, how about moving checkStream
out from tsoutil
? Which path is suitable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can reuse it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move it to grpcutil
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
/merge |
@lhy1024: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: a3ed8e4
|
* pd-ctl: make `TestShowKeyspaceGroupPrimary` stable (tikv#6903) close tikv#6783 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * pd-ctl: fix keyspace group split hang when pd is in pd-mode (tikv#6907) close tikv#6906 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * etcdutil, leadership: use RequestProgress in watch loop (tikv#6961) close tikv#6889 Signed-off-by: lhy1024 <[email protected]> --------- Signed-off-by: lhy1024 <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
* pd-ctl: make `TestShowKeyspaceGroupPrimary` stable (tikv#6903) close tikv#6783 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * pd-ctl: fix keyspace group split hang when pd is in pd-mode (tikv#6907) close tikv#6906 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: lhy1024 <[email protected]> * mcs/tso: add log to show the progress of group cleaning up (tikv#6971) ref tikv#6589 Add log to show the progress of group cleaning up. Signed-off-by: JmPotato <[email protected]> * etcdutil, leadership: use RequestProgress in watch loop (tikv#6961) close tikv#6889 Signed-off-by: lhy1024 <[email protected]> --------- Signed-off-by: lhy1024 <[email protected]> Signed-off-by: JmPotato <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Co-authored-by: JmPotato <[email protected]>
What problem does this PR solve?
Issue Number: Close #6889
What is changed and how does it work?
When connecting to multiple endpoints and injecting pd-leader-io-hang, the watch chan may get stuck, so I added a ticker to send
request progress
to ping whether the watch chan is working normally, and rebuild it if it doesn't receive messages from the watch chan for a long time.And add
CheckWatchChan
to avoid block when creating a watch channel.Use
logFields
to reduce code in log outputCheck List
Tests
run tpcc in pd-leader-io-hang case twenty times
https://tcms.pingcap.net/dashboard/executions/plan/2130511
https://tcms.pingcap.net/dashboard/executions/plan/2100564
watch leader loop can exit normally
qps did not drop to zero and did not meet 9001 error
Release note