Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcast cancellation to only nodes have outstanding child tasks #54312

Merged
merged 25 commits into from
Apr 1, 2020

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Mar 26, 2020

Today when canceling a task we broadcast ban/unban requests to all nodes in the cluster. This strategy does not scale well for hierarchical cancellation. With this change, we will track outstanding child requests and broadcast cancellation to only nodes having outstanding child tasks. This change also strengthens the coordination during cancellation. A parent task will no longer be able to send child requests via transport service once it gets cancelled, which streamlines cancellation on the parent task by not requiring to manually check whether the task is cancelled during the execution.

Relates #50990
Supersedes #51157

Co-authored-by: Igor Motov [email protected]
Co-authored-by: Yannick Welsch [email protected]

@dnhatn dnhatn added >enhancement :Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. v8.0.0 v7.8.0 labels Mar 26, 2020
@dnhatn dnhatn requested review from imotov, Tim-Brooks and ywelsch March 26, 2020 23:40
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Task Management)

@dnhatn
Copy link
Member Author

dnhatn commented Mar 27, 2020

Please hold off the review. I am investigating some test failures that are related to this change.

@ywelsch
Copy link
Contributor

ywelsch commented Mar 27, 2020

I wonder if, as part of this iteration, we should already wait with the unbanning until all child tasks have been completed (successfully or not).

@dnhatn
Copy link
Member Author

dnhatn commented Mar 27, 2020

I wonder if, as part of this iteration, we should already wait with the unbanning until all child tasks have been completed (successfully or not).

++. I will add it to this PR.

@dnhatn dnhatn requested review from ywelsch, imotov and Tim-Brooks March 27, 2020 18:29
@dnhatn
Copy link
Member Author

dnhatn commented Mar 27, 2020

This is ready for reviews :).

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nhat, I've left some questions and comments.

@dnhatn
Copy link
Member Author

dnhatn commented Mar 31, 2020

@ywelsch @javanna Thanks for reviewing. This is ready again.

@dnhatn dnhatn requested a review from ywelsch March 31, 2020 22:03
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few more nits, no need for another review, however. LGTM

@dnhatn
Copy link
Member Author

dnhatn commented Apr 1, 2020

Thanks Yannick.

@dnhatn dnhatn merged commit ee3d403 into elastic:master Apr 1, 2020
@dnhatn dnhatn deleted the track-child-tasks branch April 1, 2020 15:22
dnhatn added a commit that referenced this pull request Apr 1, 2020
The main task can fail if it is canceled while some of its child tasks
are not forked yet. This commit relaxes the assertion for that situation.

Relates #54312
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Apr 6, 2020
…astic#54312)

Today when canceling a task we broadcast ban/unban requests to all nodes
in the cluster. This strategy does not scale well for hierarchical
cancellation. With this change, we will track outstanding child requests
and broadcast the cancellation to only nodes that have outstanding child
tasks. This change also prevents a parent task from sending child
requests once it got canceled.

Relates elastic#50990
Supersedes elastic#51157

Co-authored-by: Igor Motov <[email protected]>
Co-authored-by: Yannick Welsch <[email protected]>
dnhatn added a commit that referenced this pull request Apr 6, 2020
dnhatn added a commit that referenced this pull request Apr 6, 2020
…4312)

Today when canceling a task we broadcast ban/unban requests to all nodes
in the cluster. This strategy does not scale well for hierarchical
cancellation. With this change, we will track outstanding child requests
and broadcast the cancellation to only nodes that have outstanding child
tasks. This change also prevents a parent task from sending child
requests once it got canceled.

Relates #50990
Supersedes #51157

Co-authored-by: Igor Motov <[email protected]>
Co-authored-by: Yannick Welsch <[email protected]>
dnhatn added a commit that referenced this pull request May 6, 2020
We fail to unregister the child node in registerAndExecute if the parent 
task is being canceled. This leads to a bug where a cancel request never
completes.

Closes #55875
Relates #54312
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request May 6, 2020
We fail to unregister the child node in registerAndExecute if the parent
task is being canceled. This leads to a bug where a cancel request never
completes.

Closes elastic#55875
Relates elastic#54312
dnhatn added a commit that referenced this pull request May 7, 2020
We fail to unregister the child node in registerAndExecute if the parent
task is being canceled. This leads to a bug where a cancel request never
completes.

Closes #55875
Relates #54312
dnhatn added a commit that referenced this pull request May 7, 2020
We fail to unregister the child node in registerAndExecute if the parent
task is being canceled. This leads to a bug where a cancel request never
completes.

Closes #55875
Relates #54312
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. >enhancement v7.8.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants