Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy commitment key allocation #1121

Closed
lucasxia01 opened this issue Oct 4, 2024 · 0 comments · Fixed by AztecProtocol/aztec-packages#9017
Closed

Lazy commitment key allocation #1121

lucasxia01 opened this issue Oct 4, 2024 · 0 comments · Fixed by AztecProtocol/aztec-packages#9017

Comments

@lucasxia01
Copy link
Contributor

We currently allocate the commitment key at the beginning when we construct the proving key, but this means we hold a large piece of memory through many parts where we don't actually need it, like in the DeciderProvingKey constructor and also in sumcheck.

@lucasxia01 lucasxia01 added this to the Memory Optimizations milestone Oct 4, 2024
codygunton pushed a commit to AztecProtocol/aztec-packages that referenced this issue Oct 7, 2024
Resolves AztecProtocol/barretenberg#1121.

We currently create the commitment key at the beginning, when we create
the proving key. However, we do not have to do this and should not do
this because the commitment key ends up being a huge portion of memory,
at around 930MB for 2^20 circuits. We instead just create it when we
need to. For UltraHonk, that ends up being during Oink and during
Gemini. For ClientIVC, we allocate and free a commitment key for each
oink we do, and also for the final decider.

UltraHonk on a 2^20 circuit peak memory drops from 2420MiB to 1786MiB:

<img width="1016" alt="Screenshot 2024-10-04 at 5 33 25 PM"
src="https://github.com/user-attachments/assets/8f5760f8-e2b8-4b86-a0db-1ed68e0acf9f">

ClientIVC memory stays mostly unchanged because need to keep the
commitment key mostly throughout all of the folding parts.

I expect the bench timing for UltraHonk to be slightly worse given that
we reallocate the commitment key. ClientIVCBench should also be worse
because we do more commitment key allocations.

```
--------------------------------------------------------------------------------
Benchmark                      Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------
ClientIVCBench/Full/6      33391 ms        30977 ms            1 Arithmetic::accumulate=3.89126M Arithmetic::accumulate(t)=7.33056G Auxiliary::accumulate=1.98134M Auxiliary::accumulate(t)=13.0892G COMMIT::databus=108 COMMIT::databus(t)=8.88751M COMMIT::databus_inverses=36 COMMIT::databus_inverses(t)=11.2725M COMMIT::ecc_op_wires=48 COMMIT::ecc_op_wires(t)=38.6915M COMMIT::lookup_counts_tags=12 COMMIT::lookup_counts_tags(t)=193.353M COMMIT::lookup_inverses=12 COMMIT::lookup_inverses(t)=255.969M COMMIT::wires=24 COMMIT::wires(t)=2.21199G COMMIT::z_perm=12 COMMIT::z_perm(t)=2.32652G DatabusRead::accumulate=447 DatabusRead::accumulate(t)=1.53355M Decider::construct_proof=1 Decider::construct_proof(t)=1.68437G DeciderProvingKey(Circuit&)=12 DeciderProvingKey(Circuit&)(t)=2.86109G DeltaRange::accumulate=1.87876M DeltaRange::accumulate(t)=4.1979G ECCVMProver(CircuitBuilder&)=1 ECCVMProver(CircuitBuilder&)(t)=229.598M ECCVMProver::construct_proof=1 ECCVMProver::construct_proof(t)=2.57466G Elliptic::accumulate=183.692k Elliptic::accumulate(t)=452.417M Goblin::merge=23 Goblin::merge(t)=117.072M Lookup::accumulate=1.66365M Lookup::accumulate(t)=3.69193G MegaFlavor::get_row=6.18565M MegaFlavor::get_row(t)=4.20034G OinkProver::execute_grand_product_computation_round=12 OinkProver::execute_grand_product_computation_round(t)=3.59544G OinkProver::execute_log_derivative_inverse_round=12 OinkProver::execute_log_derivative_inverse_round(t)=2.48433G OinkProver::execute_preamble_round=12 OinkProver::execute_preamble_round(t)=274.895k OinkProver::execute_sorted_list_accumulator_round=12 OinkProver::execute_sorted_list_accumulator_round(t)=772.217M OinkProver::execute_wire_commitments_round=12 OinkProver::execute_wire_commitments_round(t)=1.68854G OinkProver::generate_alphas_round=12 OinkProver::generate_alphas_round(t)=3.58973M Permutation::accumulate=10.6427M Permutation::accumulate(t)=40.3554G PoseidonExt::accumulate=30.452k PoseidonExt::accumulate(t)=76.5906M PoseidonInt::accumulate=210.454k PoseidonInt::accumulate(t)=371.576M ProtogalaxyProver::prove=11 ProtogalaxyProver::prove(t)=19.5665G ProtogalaxyProver_::combiner_quotient_round=11 ProtogalaxyProver_::combiner_quotient_round(t)=8.3951G ProtogalaxyProver_::compute_row_evaluations=11 ProtogalaxyProver_::compute_row_evaluations(t)=1.72459G ProtogalaxyProver_::perturbator_round=11 ProtogalaxyProver_::perturbator_round(t)=2.61146G ProtogalaxyProver_::run_oink_prover_on_each_incomplete_key=11 ProtogalaxyProver_::run_oink_prover_on_each_incomplete_key(t)=7.8871G ProtogalaxyProver_::update_target_sum_and_fold=11 ProtogalaxyProver_::update_target_sum_and_fold(t)=672.681M TranslatorCircuitBuilder::constructor=1 TranslatorCircuitBuilder::constructor(t)=32.7314M TranslatorProver=1 TranslatorProver(t)=46.9982M TranslatorProver::construct_proof=1 TranslatorProver::construct_proof(t)=843.494M batch_mul_with_endomorphism=16 batch_mul_with_endomorphism(t)=405.64M commit=542 commit(t)=6.73009G commit_sparse=36 commit_sparse(t)=11.2568M compute_combiner=11 compute_combiner(t)=7.9922G compute_perturbator=11 compute_perturbator(t)=2.61115G compute_univariate=51 compute_univariate(t)=2.16081G construct_circuits=12 construct_circuits(t)=4.36072G pippenger=214 pippenger(t)=100.623M pippenger_unsafe_optimized_for_non_dyadic_polys=542 pippenger_unsafe_optimized_for_non_dyadic_polys(t)=6.6333G
Benchmarking lock deleted.
client_ivc_bench.json                 100% 6936   183.4KB/s   00:00    
function                                  ms     % sum
construct_circuits(t)                   4361    13.53%
DeciderProvingKey(Circuit&)(t)          2861     8.88%
ProtogalaxyProver::prove(t)            19566    60.69%
Decider::construct_proof(t)             1684     5.22%
ECCVMProver(CircuitBuilder&)(t)          230     0.71%
ECCVMProver::construct_proof(t)         2575     7.99%
TranslatorProver::construct_proof(t)     843     2.62%
Goblin::merge(t)                         117     0.36%

Total time accounted for: 32237ms/33391ms = 96.55%

Major contributors:
function                                  ms    % sum
commit(t)                               6730   20.88%
compute_combiner(t)                     7992   24.79%
compute_perturbator(t)                  2611    8.10%
compute_univariate(t)                   2161    6.70%

Breakdown of ProtogalaxyProver::prove:
ProtogalaxyProver_::run_oink_prover_on_each_incomplete_key(t)    7887    40.31%
ProtogalaxyProver_::perturbator_round(t)                         2611    13.35%
ProtogalaxyProver_::combiner_quotient_round(t)                   8395    42.91%
ProtogalaxyProver_::update_target_sum_and_fold(t)                 673     3.44%

Relation contributions (times to be interpreted relatively):
Total time accounted for (ms):    69567
operation                       ms     % sum
Arithmetic::accumulate(t)     7331    10.54%
Permutation::accumulate(t)   40355    58.01%
Lookup::accumulate(t)         3692     5.31%
DeltaRange::accumulate(t)     4198     6.03%
Elliptic::accumulate(t)        452     0.65%
Auxiliary::accumulate(t)     13089    18.82%
EccOp::accumulate(t)             0     0.00%
DatabusRead::accumulate(t)       2     0.00%
PoseidonExt::accumulate(t)      77     0.11%
PoseidonInt::accumulate(t)     372     0.53%

Commitment contributions:
Total time accounted for (ms):     5047
operation                          ms     % sum
COMMIT::wires(t)                 2212    43.83%
COMMIT::z_perm(t)                2327    46.10%
COMMIT::databus(t)                  9     0.18%
COMMIT::ecc_op_wires(t)            39     0.77%
COMMIT::lookup_inverses(t)        256     5.07%
COMMIT::databus_inverses(t)        11     0.22%
COMMIT::lookup_counts_tags(t)     193     3.83%
```
AztecBot pushed a commit that referenced this issue Oct 8, 2024
Resolves #1121.

We currently create the commitment key at the beginning, when we create
the proving key. However, we do not have to do this and should not do
this because the commitment key ends up being a huge portion of memory,
at around 930MB for 2^20 circuits. We instead just create it when we
need to. For UltraHonk, that ends up being during Oink and during
Gemini. For ClientIVC, we allocate and free a commitment key for each
oink we do, and also for the final decider.

UltraHonk on a 2^20 circuit peak memory drops from 2420MiB to 1786MiB:

<img width="1016" alt="Screenshot 2024-10-04 at 5 33 25 PM"
src="https://github.com/user-attachments/assets/8f5760f8-e2b8-4b86-a0db-1ed68e0acf9f">

ClientIVC memory stays mostly unchanged because need to keep the
commitment key mostly throughout all of the folding parts.

I expect the bench timing for UltraHonk to be slightly worse given that
we reallocate the commitment key. ClientIVCBench should also be worse
because we do more commitment key allocations.

```
--------------------------------------------------------------------------------
Benchmark                      Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------
ClientIVCBench/Full/6      33391 ms        30977 ms            1 Arithmetic::accumulate=3.89126M Arithmetic::accumulate(t)=7.33056G Auxiliary::accumulate=1.98134M Auxiliary::accumulate(t)=13.0892G COMMIT::databus=108 COMMIT::databus(t)=8.88751M COMMIT::databus_inverses=36 COMMIT::databus_inverses(t)=11.2725M COMMIT::ecc_op_wires=48 COMMIT::ecc_op_wires(t)=38.6915M COMMIT::lookup_counts_tags=12 COMMIT::lookup_counts_tags(t)=193.353M COMMIT::lookup_inverses=12 COMMIT::lookup_inverses(t)=255.969M COMMIT::wires=24 COMMIT::wires(t)=2.21199G COMMIT::z_perm=12 COMMIT::z_perm(t)=2.32652G DatabusRead::accumulate=447 DatabusRead::accumulate(t)=1.53355M Decider::construct_proof=1 Decider::construct_proof(t)=1.68437G DeciderProvingKey(Circuit&)=12 DeciderProvingKey(Circuit&)(t)=2.86109G DeltaRange::accumulate=1.87876M DeltaRange::accumulate(t)=4.1979G ECCVMProver(CircuitBuilder&)=1 ECCVMProver(CircuitBuilder&)(t)=229.598M ECCVMProver::construct_proof=1 ECCVMProver::construct_proof(t)=2.57466G Elliptic::accumulate=183.692k Elliptic::accumulate(t)=452.417M Goblin::merge=23 Goblin::merge(t)=117.072M Lookup::accumulate=1.66365M Lookup::accumulate(t)=3.69193G MegaFlavor::get_row=6.18565M MegaFlavor::get_row(t)=4.20034G OinkProver::execute_grand_product_computation_round=12 OinkProver::execute_grand_product_computation_round(t)=3.59544G OinkProver::execute_log_derivative_inverse_round=12 OinkProver::execute_log_derivative_inverse_round(t)=2.48433G OinkProver::execute_preamble_round=12 OinkProver::execute_preamble_round(t)=274.895k OinkProver::execute_sorted_list_accumulator_round=12 OinkProver::execute_sorted_list_accumulator_round(t)=772.217M OinkProver::execute_wire_commitments_round=12 OinkProver::execute_wire_commitments_round(t)=1.68854G OinkProver::generate_alphas_round=12 OinkProver::generate_alphas_round(t)=3.58973M Permutation::accumulate=10.6427M Permutation::accumulate(t)=40.3554G PoseidonExt::accumulate=30.452k PoseidonExt::accumulate(t)=76.5906M PoseidonInt::accumulate=210.454k PoseidonInt::accumulate(t)=371.576M ProtogalaxyProver::prove=11 ProtogalaxyProver::prove(t)=19.5665G ProtogalaxyProver_::combiner_quotient_round=11 ProtogalaxyProver_::combiner_quotient_round(t)=8.3951G ProtogalaxyProver_::compute_row_evaluations=11 ProtogalaxyProver_::compute_row_evaluations(t)=1.72459G ProtogalaxyProver_::perturbator_round=11 ProtogalaxyProver_::perturbator_round(t)=2.61146G ProtogalaxyProver_::run_oink_prover_on_each_incomplete_key=11 ProtogalaxyProver_::run_oink_prover_on_each_incomplete_key(t)=7.8871G ProtogalaxyProver_::update_target_sum_and_fold=11 ProtogalaxyProver_::update_target_sum_and_fold(t)=672.681M TranslatorCircuitBuilder::constructor=1 TranslatorCircuitBuilder::constructor(t)=32.7314M TranslatorProver=1 TranslatorProver(t)=46.9982M TranslatorProver::construct_proof=1 TranslatorProver::construct_proof(t)=843.494M batch_mul_with_endomorphism=16 batch_mul_with_endomorphism(t)=405.64M commit=542 commit(t)=6.73009G commit_sparse=36 commit_sparse(t)=11.2568M compute_combiner=11 compute_combiner(t)=7.9922G compute_perturbator=11 compute_perturbator(t)=2.61115G compute_univariate=51 compute_univariate(t)=2.16081G construct_circuits=12 construct_circuits(t)=4.36072G pippenger=214 pippenger(t)=100.623M pippenger_unsafe_optimized_for_non_dyadic_polys=542 pippenger_unsafe_optimized_for_non_dyadic_polys(t)=6.6333G
Benchmarking lock deleted.
client_ivc_bench.json                 100% 6936   183.4KB/s   00:00    
function                                  ms     % sum
construct_circuits(t)                   4361    13.53%
DeciderProvingKey(Circuit&)(t)          2861     8.88%
ProtogalaxyProver::prove(t)            19566    60.69%
Decider::construct_proof(t)             1684     5.22%
ECCVMProver(CircuitBuilder&)(t)          230     0.71%
ECCVMProver::construct_proof(t)         2575     7.99%
TranslatorProver::construct_proof(t)     843     2.62%
Goblin::merge(t)                         117     0.36%

Total time accounted for: 32237ms/33391ms = 96.55%

Major contributors:
function                                  ms    % sum
commit(t)                               6730   20.88%
compute_combiner(t)                     7992   24.79%
compute_perturbator(t)                  2611    8.10%
compute_univariate(t)                   2161    6.70%

Breakdown of ProtogalaxyProver::prove:
ProtogalaxyProver_::run_oink_prover_on_each_incomplete_key(t)    7887    40.31%
ProtogalaxyProver_::perturbator_round(t)                         2611    13.35%
ProtogalaxyProver_::combiner_quotient_round(t)                   8395    42.91%
ProtogalaxyProver_::update_target_sum_and_fold(t)                 673     3.44%

Relation contributions (times to be interpreted relatively):
Total time accounted for (ms):    69567
operation                       ms     % sum
Arithmetic::accumulate(t)     7331    10.54%
Permutation::accumulate(t)   40355    58.01%
Lookup::accumulate(t)         3692     5.31%
DeltaRange::accumulate(t)     4198     6.03%
Elliptic::accumulate(t)        452     0.65%
Auxiliary::accumulate(t)     13089    18.82%
EccOp::accumulate(t)             0     0.00%
DatabusRead::accumulate(t)       2     0.00%
PoseidonExt::accumulate(t)      77     0.11%
PoseidonInt::accumulate(t)     372     0.53%

Commitment contributions:
Total time accounted for (ms):     5047
operation                          ms     % sum
COMMIT::wires(t)                 2212    43.83%
COMMIT::z_perm(t)                2327    46.10%
COMMIT::databus(t)                  9     0.18%
COMMIT::ecc_op_wires(t)            39     0.77%
COMMIT::lookup_inverses(t)        256     5.07%
COMMIT::databus_inverses(t)        11     0.22%
COMMIT::lookup_counts_tags(t)     193     3.83%
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant