-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manager: Fix hanging Stop method #2203
Conversation
LGTM |
@@ -361,7 +361,7 @@ func (n *Node) JoinAndStart(ctx context.Context) (err error) { | |||
if err != nil { | |||
n.stopMu.Lock() | |||
// to shutdown transport | |||
close(n.stopped) | |||
n.cancelFunc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with this part of the code, but I didn't follow why this change is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would make it safer to call manager.Stop()
even after JoinAndStart
errors, because if JoinAndStart
errors the channel is closed, and manager.Stop()
calls raft.Node.Cancel()
, which calls n.cancelFunc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyli thanks for the explanation, I understand better now.
If raftNode.JoinAndStart failed, Stop will block forever because it waits for the manager to start up. To fix this, close the "started" channel even if Run exits early due to an error. Fix the way the collector is initialized so its Stop method won't hang either. Add a test that makes sure the node shuts down cleanly after a failed manager initialization. Signed-off-by: Aaron Lehmann <[email protected]>
3478145
to
cc882cf
Compare
Codecov Report
@@ Coverage Diff @@
## master #2203 +/- ##
==========================================
- Coverage 60.19% 59.94% -0.25%
==========================================
Files 119 119
Lines 19835 19849 +14
==========================================
- Hits 11939 11898 -41
- Misses 6549 6604 +55
Partials 1347 1347 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Includes moby/swarmkit#2203 Signed-off-by: Andrea Luzzardi <[email protected]>
Includes: - moby/swarmkit#2203 - moby/swarmkit#2210 - moby/swarmkit#2212 Signed-off-by: Andrea Luzzardi <[email protected]> Signed-off-by: Tibor Vass <[email protected]>
Includes: - moby/swarmkit#2203 - moby/swarmkit#2210 - moby/swarmkit#2212 Signed-off-by: Andrea Luzzardi <[email protected]> Signed-off-by: Tibor Vass <[email protected]>
Includes: - moby/swarmkit#2203 - moby/swarmkit#2210 - moby/swarmkit#2212 Signed-off-by: Andrea Luzzardi <[email protected]> Signed-off-by: Tibor Vass <[email protected]>
Includes: - moby/swarmkit#2203 - moby/swarmkit#2210 - moby/swarmkit#2212 Signed-off-by: Andrea Luzzardi <[email protected]> Signed-off-by: Tibor Vass <[email protected]>
Includes: - moby/swarmkit#2203 - moby/swarmkit#2210 - moby/swarmkit#2212 Signed-off-by: Andrea Luzzardi <[email protected]> Signed-off-by: Tibor Vass <[email protected]>
If
raftNode.JoinAndStart
failed,Stop
will block forever because it waits for the manager to start up.To fix this, close the "started" channel even if Run exits early due to an error. Fix the way the collector is initialized so its
Stop
method won't hang either.Add a test that makes sure the node shuts down cleanly after a failed manager initialization.
cc @cyli