-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Multithreaded Sumcheck #556
Conversation
f3fbed9
to
5b76ea4
Compare
@@ -35,6 +35,7 @@ bin/$BENCH_TARGET --benchmark_format=json > $BRANCH_RESULTS | |||
# Checkout baseline branch, run benchmarks, save results in json format | |||
echo -e "\nConfiguring and building $BENCH_TARGET in $BASELINE_BRANCH branch..\n" | |||
git checkout master > /dev/null | |||
cd $BASE_DIR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR, just a bug in the script
@@ -36,7 +36,7 @@ static Univariate<FF, max_relation_length> compute_round_univariate( | |||
const RelationParameters<FF>& relation_parameters, | |||
const FF alpha) | |||
{ | |||
size_t round_size = 1; | |||
size_t round_size = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was always a bug but it didnt matter because it was being used in loop like this for (size_t edge_idx = 0; edge_idx < round_size; edge_idx += 2)
in compute_univariate
Round size is at minimum 2, never 1 in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may have been left over from when we thought of round size as the number of edges. Its not that anymmore. Its the number of rows ( = number of edges * 2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely, compact PR here. We need to tune the parameters and try to understand where we stop getting linear scaling, but this is definitely a nice and significant first step.
size_t iterations_per_thread = round_size / num_threads; // actual iterations per thread | ||
|
||
// Constuct univariate accumulator containers; one per thread | ||
std::vector<RelationUnivariates> thread_univariate_accumulators; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just initialize with the size rather than to resize. Maybe doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
* @param tuple_2 Second summand | ||
*/ | ||
template <typename... T> | ||
static constexpr void add_tuples(std::tuple<T...>& tuple_1, const std::tuple<T...>& tuple_2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I think there's built-in folding over the plus operator using something like +...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's likely a lot of room for improvement in a lot of the tuple based methods I've written. Could be a nice puzzle for someone (maybe even me) at some point.
Description
Introduces multithreading to the primary loop in Sumcheck. Essentially, each thread is responsible for accumulating the univariate contribution of each sub-relation for a subset of edges per round. After all threads have completed their work, the accumulators from each thread are combined to obtain the contribution over the complete hypercube for the round.
The number of cores is variable across rounds and is determined by
min_iterations_per_thread
(currently 2^6). Early rounds (for sufficiently large circuits) will use all available cores. As the round size is reduced, we use the minimum number of threads such that each thread is responsible for at leastmin_iterations_per_thread
. This leads to the use of successively fewer threads, down to a single thread in the final rounds.Note: A few functions in sumcheck round now take as input things that used to be accessed as class members (e.g.
univariate_accumulators
andextended_edges
. This is because multiple threads cannot access the same location in memory. The solution was to simply make a corresponding container for each thread. (This is not expensive becauseunivariate_accumulators
andextended_edges
are very small objects).These results show an average 77% CPU time reduction across the Ultra benchmarks as compared to
master
:Checklist:
@brief
describing the intended functionality.include
directives have been added.