Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Multithreaded Sumcheck #556

Merged
merged 7 commits into from
Jun 23, 2023
Merged

feat: Multithreaded Sumcheck #556

merged 7 commits into from
Jun 23, 2023

Conversation

ledwards2225
Copy link
Collaborator

@ledwards2225 ledwards2225 commented Jun 22, 2023

Description

Introduces multithreading to the primary loop in Sumcheck. Essentially, each thread is responsible for accumulating the univariate contribution of each sub-relation for a subset of edges per round. After all threads have completed their work, the accumulators from each thread are combined to obtain the contribution over the complete hypercube for the round.

The number of cores is variable across rounds and is determined by min_iterations_per_thread (currently 2^6). Early rounds (for sufficiently large circuits) will use all available cores. As the round size is reduced, we use the minimum number of threads such that each thread is responsible for at least min_iterations_per_thread. This leads to the use of successively fewer threads, down to a single thread in the final rounds.

Note: A few functions in sumcheck round now take as input things that used to be accessed as class members (e.g. univariate_accumulators and extended_edges. This is because multiple threads cannot access the same location in memory. The solution was to simply make a corresponding container for each thread. (This is not expensive because univariate_accumulators and extended_edges are very small objects).

These results show an average 77% CPU time reduction across the Ultra benchmarks as compared to master:
Screenshot 2023-06-23 at 8 24 04 AM

Checklist:

  • I have reviewed my diff in github, line by line.
  • Every change is related to the PR description.
  • The branch has been merged with/rebased against the head of its merge target.
  • There are no unexpected formatting changes, superfluous debug logs, or commented-out code.
  • There are no circuit changes, OR a cryptographer has been assigned for review.
  • New functions, classes, etc. have been documented according to the doxygen comment format. Classes and structs must have @brief describing the intended functionality.
  • If existing code has been modified, such documentation has been added or updated.
  • No superfluous include directives have been added.
  • I have linked to any issue(s) it resolves.
  • I'm happy for the PR to be merged at the reviewer's next convenience.

@ledwards2225 ledwards2225 changed the title Multithreaded Sumcheck feat: Multithreaded Sumcheck Jun 22, 2023
@ledwards2225 ledwards2225 self-assigned this Jun 22, 2023
@ledwards2225 ledwards2225 linked an issue Jun 22, 2023 that may be closed by this pull request
@ledwards2225 ledwards2225 force-pushed the lde/parallel_sumcheck branch from f3fbed9 to 5b76ea4 Compare June 23, 2023 15:01
@@ -35,6 +35,7 @@ bin/$BENCH_TARGET --benchmark_format=json > $BRANCH_RESULTS
# Checkout baseline branch, run benchmarks, save results in json format
echo -e "\nConfiguring and building $BENCH_TARGET in $BASELINE_BRANCH branch..\n"
git checkout master > /dev/null
cd $BASE_DIR
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, just a bug in the script

@@ -36,7 +36,7 @@ static Univariate<FF, max_relation_length> compute_round_univariate(
const RelationParameters<FF>& relation_parameters,
const FF alpha)
{
size_t round_size = 1;
size_t round_size = 2;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was always a bug but it didnt matter because it was being used in loop like this for (size_t edge_idx = 0; edge_idx < round_size; edge_idx += 2) in compute_univariate Round size is at minimum 2, never 1 in practice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have been left over from when we thought of round size as the number of edges. Its not that anymmore. Its the number of rows ( = number of edges * 2)

@ledwards2225 ledwards2225 marked this pull request as ready for review June 23, 2023 15:27
@ledwards2225 ledwards2225 requested a review from codygunton June 23, 2023 15:27
Copy link
Collaborator

@codygunton codygunton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely, compact PR here. We need to tune the parameters and try to understand where we stop getting linear scaling, but this is definitely a nice and significant first step.

size_t iterations_per_thread = round_size / num_threads; // actual iterations per thread

// Constuct univariate accumulator containers; one per thread
std::vector<RelationUnivariates> thread_univariate_accumulators;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just initialize with the size rather than to resize. Maybe doesn't matter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

* @param tuple_2 Second summand
*/
template <typename... T>
static constexpr void add_tuples(std::tuple<T...>& tuple_1, const std::tuple<T...>& tuple_2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I think there's built-in folding over the plus operator using something like +...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's likely a lot of room for improvement in a lot of the tuple based methods I've written. Could be a nice puzzle for someone (maybe even me) at some point.

@ledwards2225 ledwards2225 merged commit c4094b1 into master Jun 23, 2023
@ledwards2225 ledwards2225 deleted the lde/parallel_sumcheck branch June 23, 2023 19:27
ludamad pushed a commit to AztecProtocol/aztec-packages that referenced this pull request Jul 22, 2023
ludamad pushed a commit to AztecProtocol/aztec-packages that referenced this pull request Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallelize sumcheck
2 participants