Use `TaskGroup` to ensure all primary / worker tasks are cancelled on error and panic #707

mwtian · 2022-08-07T06:32:12Z

Put all JobHandle of primary, worker and consensus into a TaskGroup, so if one task returns an error or panics, the whole group will be cancelled.

If we want to stop the primary or worker, we can just drop the TaskManager associated with the TaskGroup.

For cluster tests, I'm assuming JobHandles will be finish on their own, i.e. checking where a JobHandle is_finished() is equivalent to checking if it has been cancelled.

asonnino · 2022-08-08T12:49:11Z

node/src/lib.rs

-        Ok(handles)
+        let (task_group, task_manager) = TaskGroup::new();
+        for (name, handle) in handles {
+            let _ = task_group.spawn(name, handle);


what would be returned here?

what are we spawning exactly?

Added a comment. Basically a future that can be awaited as the JoinHandle is returned. But the returned future is unnecessary because we will use the task_manager to await for task terminations.

asonnino · 2022-08-08T12:50:06Z

node/src/lib.rs

@@ -341,8 +347,12 @@ impl Node {
                store.batch_store.clone(),
                metrics.clone(),
            );
-            handles.extend(worker_handles);
+            // TODO: propagate worker task names if needed.


should this be an issue?

Opened an issue and linked here. If other aspects of this PR look ok, I can make the change in this PR too.

mwtian · 2022-08-08T23:33:29Z

Another potential refactor is to simplify shutdown handling: instead of each task implement logic to handle ReconfigureNotification::Shutdown, we may be able to handle it in one place to shutdown all tasks.

huitseeker

Sorry for the late review! This LGTM modulo a rebase,

Another potential refactor is to simplify shutdown handling: instead of each task implement logic to handle ReconfigureNotification::Shutdown, we may be able to handle it in one place to shutdown all tasks.

I think the goal of the ReconfigureNotification::Shutdown is to allow more ad-hoc graceful shutdown logic than killing the task.

mwtian · 2022-08-13T00:20:14Z

Sorry for the late review! This LGTM modulo a rebase,

Another potential refactor is to simplify shutdown handling: instead of each task implement logic to handle ReconfigureNotification::Shutdown, we may be able to handle it in one place to shutdown all tasks.

I think the goal of the ReconfigureNotification::Shutdown is to allow more ad-hoc graceful shutdown logic than killing the task.

Makes sense.

huitseeker

TY!

… error and panic (MystenLabs#707)

… error and panic (#707)

…elled on error and panic (#707)" This reverts commit 693e879.

Revert "Use `TaskGroup` to ensure all primary / worker tasks are cancelled on error and panic (#707) This reverts commit 693e879.

Revert "Use `TaskGroup` to ensure all primary / worker tasks are cancelled on error and panic (MystenLabs#707) This reverts commit 693e879.

Revert "Use `TaskGroup` to ensure all primary / worker tasks are cancelled on error and panic (#707) This reverts commit 693e879.

… error and panic (MystenLabs/narwhal#707)

Revert "Use `TaskGroup` to ensure all primary / worker tasks are cancelled on error and panic (MystenLabs/narwhal#707) This reverts commit 693e87979d6be32d29414f1639735760d55c0b21.

mwtian force-pushed the task-group branch from cde731c to 3af89a4 Compare August 7, 2022 06:55

task group

99f2372

mwtian force-pushed the task-group branch from 3af89a4 to 99f2372 Compare August 7, 2022 17:28

mwtian marked this pull request as ready for review August 7, 2022 17:28

mwtian requested a review from asonnino as a code owner August 7, 2022 17:28

fixup! task group

ea922d1

mwtian changed the title ~~[WIP] Use task group to ensure all primary / worker tasks are cancelled on error and panic~~ Use TaskGroup to ensure all primary / worker tasks are cancelled on error and panic Aug 7, 2022

mwtian requested review from akichidis and huitseeker August 7, 2022 18:08

asonnino reviewed Aug 8, 2022

View reviewed changes

mwtian mentioned this pull request Oct 17, 2022

Use more informative names for worker tasks. MystenLabs/sui#5319

Closed

mwtian added 2 commits August 8, 2022 10:59

fixup! fixup! task group

7ffd27b

Merge branch 'main' into task-group

ab49ba2

fixup! Merge branch 'main' into task-group

599a6d9

huitseeker reviewed Aug 12, 2022

View reviewed changes

mwtian force-pushed the task-group branch from 1784503 to f6d8456 Compare August 13, 2022 00:19

huitseeker approved these changes Aug 13, 2022

View reviewed changes

Merge branch 'main' into task-group

6ad78c9

mwtian force-pushed the task-group branch from f6d8456 to 6ad78c9 Compare August 13, 2022 02:14

mwtian merged commit 583ab6a into MystenLabs:main Aug 13, 2022

mwtian deleted the task-group branch August 13, 2022 03:45

huitseeker mentioned this pull request Aug 14, 2022

[refactor] use block_waiter instead of batch_loader #738

Merged

mwtian mentioned this pull request Oct 17, 2022

Fix integration test failure after integrating with TaskGroup MystenLabs/sui#5323

Closed

huitseeker mentioned this pull request Aug 16, 2022

[refactor] The majority of the spawned tasks do not return a JoinHandle #79

Closed

huitseeker pushed a commit to huitseeker/narwhal that referenced this pull request Aug 16, 2022

Use TaskGroup to ensure all primary / worker tasks are cancelled on…

1d02581

… error and panic (MystenLabs#707)

huitseeker pushed a commit that referenced this pull request Aug 16, 2022

Use TaskGroup to ensure all primary / worker tasks are cancelled on…

693e879

… error and panic (#707)

akichidis added a commit that referenced this pull request Aug 18, 2022

Revert "Use TaskGroup to ensure all primary / worker tasks are canc…

76e8d55

…elled on error and panic (#707)" This reverts commit 693e879.

akichidis mentioned this pull request Aug 18, 2022

[test] showcase node shutdown issue #811

Closed

akichidis added a commit that referenced this pull request Aug 18, 2022

[test] revert task group change (#812)

39aa003

Revert "Use `TaskGroup` to ensure all primary / worker tasks are cancelled on error and panic (#707) This reverts commit 693e879.

huitseeker pushed a commit that referenced this pull request Aug 25, 2022

[test] revert task group change (#812)

a6ddc6d

Revert "Use `TaskGroup` to ensure all primary / worker tasks are cancelled on error and panic (#707) This reverts commit 693e879.

mwtian added a commit to mwtian/sui that referenced this pull request Sep 30, 2022

Use TaskGroup to ensure all primary / worker tasks are cancelled on…

e912e03

… error and panic (MystenLabs/narwhal#707)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `TaskGroup` to ensure all primary / worker tasks are cancelled on error and panic #707

Use `TaskGroup` to ensure all primary / worker tasks are cancelled on error and panic #707

mwtian commented Aug 7, 2022 •

edited

Loading

asonnino Aug 8, 2022

asonnino Aug 8, 2022

mwtian Aug 8, 2022

asonnino Aug 8, 2022

mwtian Aug 8, 2022

mwtian commented Aug 8, 2022

huitseeker left a comment •

edited

Loading

mwtian commented Aug 13, 2022

huitseeker left a comment

Use TaskGroup to ensure all primary / worker tasks are cancelled on error and panic #707

Use TaskGroup to ensure all primary / worker tasks are cancelled on error and panic #707

Conversation

mwtian commented Aug 7, 2022 • edited Loading

asonnino Aug 8, 2022

Choose a reason for hiding this comment

asonnino Aug 8, 2022

Choose a reason for hiding this comment

mwtian Aug 8, 2022

Choose a reason for hiding this comment

asonnino Aug 8, 2022

Choose a reason for hiding this comment

mwtian Aug 8, 2022

Choose a reason for hiding this comment

mwtian commented Aug 8, 2022

huitseeker left a comment • edited Loading

Choose a reason for hiding this comment

mwtian commented Aug 13, 2022

huitseeker left a comment

Choose a reason for hiding this comment

Use `TaskGroup` to ensure all primary / worker tasks are cancelled on error and panic #707

Use `TaskGroup` to ensure all primary / worker tasks are cancelled on error and panic #707

mwtian commented Aug 7, 2022 •

edited

Loading

huitseeker left a comment •

edited

Loading