Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spawn VID with spawn_blocking #2369

Merged
merged 6 commits into from
Jan 10, 2024
Merged

Spawn VID with spawn_blocking #2369

merged 6 commits into from
Jan 10, 2024

Conversation

bfish713
Copy link
Collaborator

@bfish713 bfish713 commented Jan 9, 2024

Closes #2349

This PR:

Spawn the VID calculation on a different thread from the async executor to not block other tasks. From this blog https://ryhl.io/blog/async-what-is-blocking/ from the tokio creator we should not be going more than ~100 micro seconds. VID disperse takes an order of magnitude or longer than this so we should run it on it a dedicated thread for blocking. The blog recommends rayon for cpu tasks but also says if we don't have a lot of CPU work than spawn_blocking is ok

I also increased the async threads to two in the just file because I believe the CI infra has 2 threads (except for maybe self hosted). Personally I think we should remove this restriction entirely.

This PR does not:

Look at other places where we block async tasks for cpu heavy work. We should probably audit the code for other places where we might go a long time between awaits in our code. Potentially signature aggregation, qc validation, and serialization.

It also doesn't make the VID disperse run on multiple threads. As mentioned in the task we should not be dividing up the block on the consensus side. The further parallelization will be done in jellyfish.

Key places to review:

vid.rs where we now spawn a task. Note that the unwrap for the tokio version just passes on any panics that happen during the thread, so it is "safe", i.e it's not adding a new place we can panic.

This should improve the consistency of our CI since we should be able to run faster. I think a key issue with our tests was that all nodes would block each other from running while the leader calculated the vid disperse and blocked the main executor. In a single node setting it's probably not so bad since a leader would probably not have other work to do while it was blocked on vid calculation.

@bfish713 bfish713 self-assigned this Jan 10, 2024
@bfish713 bfish713 added this to the Sprint 7 milestone Jan 10, 2024
@bfish713 bfish713 marked this pull request as ready for review January 10, 2024 18:28
shenkeyao
shenkeyao previously approved these changes Jan 10, 2024
Copy link
Member

@shenkeyao shenkeyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a minor comment.

let vid_disperse = vid.disperse(encoded_transactions.clone()).unwrap();

let vid_disperse = spawn_blocking(move || {
let vid = VidScheme::new(chunk_size, num_quorum_committee, &srs).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the note below (copied from the PR description) as a comment here?

Note that the unwrap for the tokio version just passes on any panics that happen during the thread, so it is "safe", i.e it's not adding a new place we can panic.

@bfish713 bfish713 merged commit 1d03326 into main Jan 10, 2024
13 checks passed
@bfish713 bfish713 deleted the bf/v/vid-spawn-blocking branch January 10, 2024 19:15
@bfish713 bfish713 changed the title Spawn VID with spawn_blocking WIP Spawn VID with spawn_blocking Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VID] - Parallelize VID disperse calculations
2 participants