Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

halo2_proofs: Avoid caching rotated polynomials in poly::Evaluator #644

Merged
merged 3 commits into from
Sep 10, 2022

Conversation

str4d
Copy link
Contributor

@str4d str4d commented Aug 25, 2022

Previously we used the existing Polynomial::rotate and EvaluationDomain::rotate_extended APIs to rotate the polynomials we query at non-zero rotations, and then stored references to the chunks of each (rotated and unrotated) polynomial as slices. When we reached an AstLeaf, we would then clone the corresponding polynomial chunk.

We now avoid all of this with combined "rotate-and-chunk" APIs, making use of the observation that a rotation is simply a renumbering of the indices of the polynomial coefficients. Given that when we reach an AstLeaf we already needed to allocate, these new APIs have the same number of allocations during AST evaluation, but enable us to completely avoid caching any information about the rotations or rotated polynomials ahead of time.

Previously we used the existing `Polynomial::rotate` and
`EvaluationDomain::rotate_extended` APIs to rotate the polynomials we
query at non-zero rotations, and then stored references to the chunks of
each (rotated and unrotated) polynomial as slices. When we reached an
`AstLeaf`, we would then clone the corresponding polynomial chunk.

We now avoid all of this with combined "rotate-and-chunk" APIs, making
use of the observation that a rotation is simply a renumbering of the
indices of the polynomial coefficients. Given that when we reach an
`AstLeaf` we already needed to allocate, these new APIs have the same
number of allocations during AST evaluation, but enable us to completely
avoid caching any information about the rotations or rotated polynomials
ahead of time.
@str4d
Copy link
Contributor Author

str4d commented Aug 25, 2022

Running heaptrack on the Orchard test circuit::tests::round_trip (which runs MockProver and then creates and verifies a single proof) with a Ryzen 9 5950X, I get the following memory usage measurements before and after this change:

❯ heaptrack --analyze heaptrack.circuit.before.zst | tail -n 7
total runtime: 5.20s.
bytes allocated in total (ignoring deallocations): 2.95GB (567.03MB/s)
calls to allocation functions: 1125346 (216412/s)
temporary memory allocations: 266765 (51300/s)
peak heap memory consumption: 96.05MB
peak RSS (including heaptrack overhead): 680.80MB
total memory leaked: 684.67KB

❯ heaptrack --analyze heaptrack.circuit.after.zst | tail -n 7
total runtime: 5.17s.
bytes allocated in total (ignoring deallocations): 2.93GB (567.36MB/s)
calls to allocation functions: 1114236 (215561/s)
temporary memory allocations: 268533 (51950/s)
peak heap memory consumption: 81.70MB
peak RSS (including heaptrack overhead): 628.31MB
total memory leaked: 547.88KB

So for the Orchard circuit, this is a 14.9% reduction in peak heap memory consumption, and a 7.7% reduction in peak RSS. It also gives a minor performance boost to proving times (Ryzen 9 5950X):

proving/bundle/1        time:   [983.50 ms 990.66 ms 997.86 ms]                           
                        change: [-2.2962% -1.3784% -0.3654%] (p = 0.02 < 0.05)
                        Change within noise threshold.
proving/bundle/2        time:   [984.91 ms 991.11 ms 997.55 ms]                           
                        change: [-1.8530% -1.0189% -0.1863%] (p = 0.04 < 0.05)
                        Change within noise threshold.
proving/bundle/3        time:   [1.3261 s 1.3329 s 1.3399 s]                              
                        change: [-1.5906% -0.7636% +0.0227%] (p = 0.11 > 0.05)
                        No change in performance detected.
proving/bundle/4        time:   [1.6655 s 1.6743 s 1.6830 s]                              
                        change: [-2.5920% -1.8598% -1.0688%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild

@jasonmorton
Copy link

jasonmorton commented Aug 25, 2022

On our compute-intensive example this reduces peak memory usage from 107GB to 42.8GB and is about 10% faster (i9-12900K). k=17, 147 columns, lookups with 2^14 rows. The reduction is across both this PR and #642 combined (#642 was about a 20% reduction).

Copy link
Contributor

@daira daira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

@str4d str4d merged commit 553584a into main Sep 10, 2022
@str4d str4d deleted the remove-rotated-poly-cache branch September 10, 2022 11:02
@str4d str4d added this to the 0.3.0 milestone Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants