-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broadcast cancellation to only nodes have outstanding child tasks #54312
Conversation
Pinging @elastic/es-distributed (:Distributed/Task Management) |
Please hold off the review. I am investigating some test failures that are related to this change. |
I wonder if, as part of this iteration, we should already wait with the unbanning until all child tasks have been completed (successfully or not). |
++. I will add it to this PR. |
This is ready for reviews :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nhat, I've left some questions and comments.
modules/reindex/src/test/java/org/elasticsearch/index/reindex/CancelTests.java
Outdated
Show resolved
Hide resolved
modules/reindex/src/test/java/org/elasticsearch/index/reindex/CancelTests.java
Outdated
Show resolved
Hide resolved
...ava/org/elasticsearch/action/admin/cluster/node/tasks/cancel/TransportCancelTasksAction.java
Outdated
Show resolved
Hide resolved
...ava/org/elasticsearch/action/admin/cluster/node/tasks/cancel/TransportCancelTasksAction.java
Outdated
Show resolved
Hide resolved
...ava/org/elasticsearch/action/admin/cluster/node/tasks/cancel/TransportCancelTasksAction.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/search/SearchCancellationIT.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/action/admin/cluster/node/tasks/CancellableTasksIT.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few more nits, no need for another review, however. LGTM
client/rest-high-level/src/main/java/org/elasticsearch/client/tasks/CancelTasksRequest.java
Show resolved
Hide resolved
...c/main/java/org/elasticsearch/action/admin/cluster/node/tasks/cancel/CancelTasksRequest.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/rest/action/admin/cluster/RestCancelTasksAction.java
Outdated
Show resolved
Hide resolved
Thanks Yannick. |
The main task can fail if it is canceled while some of its child tasks are not forked yet. This commit relaxes the assertion for that situation. Relates #54312
…astic#54312) Today when canceling a task we broadcast ban/unban requests to all nodes in the cluster. This strategy does not scale well for hierarchical cancellation. With this change, we will track outstanding child requests and broadcast the cancellation to only nodes that have outstanding child tasks. This change also prevents a parent task from sending child requests once it got canceled. Relates elastic#50990 Supersedes elastic#51157 Co-authored-by: Igor Motov <[email protected]> Co-authored-by: Yannick Welsch <[email protected]>
…4312) Today when canceling a task we broadcast ban/unban requests to all nodes in the cluster. This strategy does not scale well for hierarchical cancellation. With this change, we will track outstanding child requests and broadcast the cancellation to only nodes that have outstanding child tasks. This change also prevents a parent task from sending child requests once it got canceled. Relates #50990 Supersedes #51157 Co-authored-by: Igor Motov <[email protected]> Co-authored-by: Yannick Welsch <[email protected]>
We fail to unregister the child node in registerAndExecute if the parent task is being canceled. This leads to a bug where a cancel request never completes. Closes elastic#55875 Relates elastic#54312
Today when canceling a task we broadcast ban/unban requests to all nodes in the cluster. This strategy does not scale well for hierarchical cancellation. With this change, we will track outstanding child requests and broadcast cancellation to only nodes having outstanding child tasks. This change also strengthens the coordination during cancellation. A parent task will no longer be able to send child requests via transport service once it gets cancelled, which streamlines cancellation on the parent task by not requiring to manually check whether the task is cancelled during the execution.
Relates #50990
Supersedes #51157
Co-authored-by: Igor Motov [email protected]
Co-authored-by: Yannick Welsch [email protected]