Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep track of background peer tasks #3253

Merged
merged 18 commits into from
Dec 22, 2021
Merged

Conversation

jvff
Copy link
Contributor

@jvff jvff commented Dec 17, 2021

Motivation

When setting up a Client service for a peer connection, some background tasks are spawned. These tasks can panic, and if the spawned task's JoinHandle isn't polled, Zebra might miss the error.

This closes #3199, but currently depends on #3241 being merged first.

Solution

Store a JoinHandle for each task spawned for the peer connection inside the Client type, and poll them inside poll_ready.

Review

@teor2345

Reviewer Checklist

  • Code implements Specs and Designs
  • Tests for Expected Behaviour
  • Tests for Errors

Follow Up Work

  • Would it make sense to remove the shutdown channel and use the JoinHandle::abort method instead?

@jvff jvff requested a review from teor2345 December 17, 2021 02:23
@zfnd-bot zfnd-bot bot assigned jvff Dec 17, 2021
@teor2345 teor2345 added A-diagnostics Area: Diagnosing issues or monitoring performance A-network Area: Network protocol updates or fixes C-security Category: Security issues P-Medium labels Dec 17, 2021
@teor2345
Copy link
Contributor

I'm marking this as draft because the tests are failing, and it depends on another PR merging.

@teor2345 teor2345 marked this pull request as draft December 17, 2021 05:05
Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good design, but I have some questions about type-safety and test coverage.

I'd like to merge the following PR before we merge this PR:

Do you think we should merge these test/fix PRs before the changes in this PR?

zebra-network/src/peer/client.rs Outdated Show resolved Hide resolved
zebra-network/src/peer/client.rs Show resolved Hide resolved
zebra-network/src/peer/client.rs Outdated Show resolved Hide resolved
zebra-network/src/peer/client.rs Outdated Show resolved Hide resolved
zebra-network/src/peer/client.rs Outdated Show resolved Hide resolved
zebra-network/src/peer/handshake.rs Outdated Show resolved Hide resolved
zebra-network/src/peer/handshake.rs Outdated Show resolved Hide resolved
@mpguerra mpguerra linked an issue Dec 17, 2021 that may be closed by this pull request
@jvff jvff force-pushed the keep-track-of-peer-heartbeat-tasks branch from 33ba182 to 69b3933 Compare December 17, 2021 14:41
@jvff jvff requested a review from teor2345 December 17, 2021 14:47
@jvff jvff force-pushed the use-mocked-client-handle-in-other-tests branch from c174644 to 057d162 Compare December 18, 2021 22:17
@jvff jvff force-pushed the keep-track-of-peer-heartbeat-tasks branch from 69b3933 to d40ed78 Compare December 18, 2021 22:37
Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to make it easier to maintain this code, and avoid the risk of subtle bugs due to future refactors.

I'll see if I can fix it today.

zebra-network/src/peer/client.rs Show resolved Hide resolved
zebra-network/src/peer/client.rs Outdated Show resolved Hide resolved
zebra-network/src/peer/client.rs Outdated Show resolved Hide resolved
zebra-network/src/peer/handshake.rs Show resolved Hide resolved
zebra-network/src/peer/handshake.rs Show resolved Hide resolved
@jvff jvff force-pushed the keep-track-of-peer-heartbeat-tasks branch from d40ed78 to 52e22e2 Compare December 21, 2021 14:45
@jvff jvff requested a review from teor2345 December 21, 2021 14:46
Base automatically changed from use-mocked-client-handle-in-other-tests to main December 21, 2021 20:13
jvff added 11 commits December 21, 2021 20:30
Move the code that's part of the heartbeat task into a separate helper
function.
Keep it closer to where it's actually used, and make it easier to add
new fields to `Client` for the connection and heartbeat tasks.
Prepare it to be able to check for panics or errors from the background
tasks.
Spawn simple timeout tasks as mock connection and heartbeat tasks.
Building a `ClientTestHarness` requires a Tokio runtime to be set up, so
the calls were moved into the `async` block.
Make the code reusable for both background tasks.
Periodically poll it to check if the task has unexpectedly stopped.
The client service should stop if the connection background task has
exited, because then it's not able to receive any replies.
Wrap the background tasks in `Abortable`, so that they can be aborted
through the `ClientTestHarness`.
Check that stopping the background connection task is something that the
`Client` instance detects and handles correctly.
Check that stopping the background heartbeat task is something that the
`Client` instance detects and handles correctly.
jvff and others added 4 commits December 21, 2021 20:30
Will be used later to create background tasks that panic.
Use a mock background connection task that panics immediately, and check
that the `Client` handles it gracefully.
Use a mock background heartbeat task that panics immediately, and check
that the `Client` handles it gracefully.
The previously linked issue was a broad plan to improve Zebra's shutdown
behavior, while the new issue is more specific, and can be scheduled
sooner.

Co-authored-by: teor <[email protected]>
@jvff jvff force-pushed the keep-track-of-peer-heartbeat-tasks branch from 52e22e2 to 235219b Compare December 21, 2021 20:31
@jvff jvff marked this pull request as ready for review December 21, 2021 20:39
Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for the panic fixes!

@teor2345 teor2345 enabled auto-merge (squash) December 21, 2021 22:43
@teor2345 teor2345 merged commit 1010773 into main Dec 22, 2021
@teor2345 teor2345 deleted the keep-track-of-peer-heartbeat-tasks branch December 22, 2021 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Diagnosing issues or monitoring performance A-network Area: Network protocol updates or fixes C-security Category: Security issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop ignoring panics in the peer receiver and heartbeat tasks
2 participants