Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Use structured polys to reduce prover memory (#8587)
We use the new structured polynomial class to reduce the amount of memory used by the Prover. For ClientIVCBench, this results in a reduction of 36.5%, going from 2377.99MiB to 1511.34MiB. This is due to a restricting polynomials down to smaller sizes. For lagrange_first and last, we only allocate 1 element. For the gate selectors, we only allocate the fixed block size for each one, cutting the 8 gate selectors into almost 1 selector (caveat is that the arithmetic selector spans the aux block for now). For the 5 ecc_op polynomials, we restrict them to just the ecc_op block. For 9 of the 10 databus polynomials, we restrict them to MAX_DATABUS_SIZE. For the 4 table polynomials and the lookup read counts and read tag polynomials, we restrict them to MAX_LOOKUP_TABLES_SIZE. We also restrict the inverse polynomials, but this is complicated to explain. Overall, this essentially allows us to cut down on 28 of the 54 total polynomials, which leads to the drop of 867MiB. There's more juice to be squeezed here, but this is a massive reduction that should basically get us there. Before: <img width="1331" alt="Screenshot 2024-09-20 at 5 00 27 PM" src="https://github.com/user-attachments/assets/7572a5d2-4fa9-4b4f-af1d-7885260d6756"> After: <img width="1363" alt="Screenshot 2024-09-26 at 10 03 54 AM" src="https://github.com/user-attachments/assets/aed64b1d-862c-4a21-9e32-160993d1f5c3"> For one instance, we cut down memory by 97MiB. And timing benchmark: ``` -------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------- ClientIVCBench/Full/6 33216 ms 30637 ms 1 Arithmetic::accumulate=3.89126M Arithmetic::accumulate(t)=7.32768G Auxiliary::accumulate=1.98134M Auxiliary::accumulate(t)=13.4156G COMMIT::databus=108 COMMIT::databus(t)=8.50634M COMMIT::databus_inverses=36 COMMIT::databus_inverses(t)=11.8267M COMMIT::ecc_op_wires=48 COMMIT::ecc_op_wires(t)=38.2178M COMMIT::lookup_counts_tags=12 COMMIT::lookup_counts_tags(t)=107.571M COMMIT::lookup_inverses=12 COMMIT::lookup_inverses(t)=257.772M COMMIT::wires=24 COMMIT::wires(t)=2.23405G COMMIT::z_perm=12 COMMIT::z_perm(t)=2.31578G DatabusRead::accumulate=447 DatabusRead::accumulate(t)=1.72333M Decider::construct_proof=1 Decider::construct_proof(t)=1.57152G DeciderProvingKey(Circuit&)=12 DeciderProvingKey(Circuit&)(t)=2.63528G DeltaRange::accumulate=1.87876M DeltaRange::accumulate(t)=4.27884G ECCVMProver(CircuitBuilder&)=1 ECCVMProver(CircuitBuilder&)(t)=228.84M ECCVMProver::construct_proof=1 ECCVMProver::construct_proof(t)=2.59672G Elliptic::accumulate=183.692k Elliptic::accumulate(t)=451.988M Goblin::merge=23 Goblin::merge(t)=116.924M Lookup::accumulate=1.66363M Lookup::accumulate(t)=3.74588G MegaFlavor::get_row=6.18564M MegaFlavor::get_row(t)=4.44329G OinkProver::execute_grand_product_computation_round=12 OinkProver::execute_grand_product_computation_round(t)=3.59852G OinkProver::execute_log_derivative_inverse_round=12 OinkProver::execute_log_derivative_inverse_round(t)=2.4985G OinkProver::execute_preamble_round=12 OinkProver::execute_preamble_round(t)=178.858k OinkProver::execute_sorted_list_accumulator_round=12 OinkProver::execute_sorted_list_accumulator_round(t)=683.402M OinkProver::execute_wire_commitments_round=12 OinkProver::execute_wire_commitments_round(t)=1.71268G OinkProver::generate_alphas_round=12 OinkProver::generate_alphas_round(t)=3.50247M Permutation::accumulate=10.6427M Permutation::accumulate(t)=40.1379G PoseidonExt::accumulate=30.452k PoseidonExt::accumulate(t)=76.6116M PoseidonInt::accumulate=210.454k PoseidonInt::accumulate(t)=365.722M ProtogalaxyProver::prove=11 ProtogalaxyProver::prove(t)=19.9675G ProtogalaxyProver_::combiner_quotient_round=11 ProtogalaxyProver_::combiner_quotient_round(t)=8.76403G ProtogalaxyProver_::compute_row_evaluations=11 ProtogalaxyProver_::compute_row_evaluations(t)=1.9728G ProtogalaxyProver_::perturbator_round=11 ProtogalaxyProver_::perturbator_round(t)=2.86884G ProtogalaxyProver_::run_oink_prover_on_each_incomplete_key=11 ProtogalaxyProver_::run_oink_prover_on_each_incomplete_key(t)=7.66211G ProtogalaxyProver_::update_target_sum_and_fold=11 ProtogalaxyProver_::update_target_sum_and_fold(t)=672.424M TranslatorCircuitBuilder::constructor=1 TranslatorCircuitBuilder::constructor(t)=32.9044M TranslatorProver=1 TranslatorProver(t)=43.1984M TranslatorProver::construct_proof=1 TranslatorProver::construct_proof(t)=832.913M batch_mul_with_endomorphism=16 batch_mul_with_endomorphism(t)=408.881M commit=543 commit(t)=6.5699G commit_sparse=36 commit_sparse(t)=11.813M compute_combiner=11 compute_combiner(t)=8.32169G compute_perturbator=11 compute_perturbator(t)=2.86857G compute_univariate=51 compute_univariate(t)=2.20204G construct_circuits=12 construct_circuits(t)=4.30706G pippenger=215 pippenger(t)=102.025M pippenger_unsafe_optimized_for_non_dyadic_polys=543 pippenger_unsafe_optimized_for_non_dyadic_polys(t)=6.56543G Benchmarking lock deleted. client_ivc_bench.json 100% 6930 190.2KB/s 00:00 function ms % sum construct_circuits(t) 4307 13.35% DeciderProvingKey(Circuit&)(t) 2635 8.17% ProtogalaxyProver::prove(t) 19967 61.90% Decider::construct_proof(t) 1572 4.87% ECCVMProver(CircuitBuilder&)(t) 229 0.71% ECCVMProver::construct_proof(t) 2597 8.05% TranslatorProver::construct_proof(t) 833 2.58% Goblin::merge(t) 117 0.36% Total time accounted for: 32257ms/33216ms = 97.11% Major contributors: function ms % sum commit(t) 6570 20.37% compute_combiner(t) 8322 25.80% compute_perturbator(t) 2869 8.89% compute_univariate(t) 2202 6.83% Breakdown of ProtogalaxyProver::prove: ProtogalaxyProver_::run_oink_prover_on_each_incomplete_key(t) 7662 38.37% ProtogalaxyProver_::perturbator_round(t) 2869 14.37% ProtogalaxyProver_::combiner_quotient_round(t) 8764 43.89% ProtogalaxyProver_::update_target_sum_and_fold(t) 672 3.37% Relation contributions (times to be interpreted relatively): Total time accounted for (ms): 69802 operation ms % sum Arithmetic::accumulate(t) 7328 10.50% Permutation::accumulate(t) 40138 57.50% Lookup::accumulate(t) 3746 5.37% DeltaRange::accumulate(t) 4279 6.13% Elliptic::accumulate(t) 452 0.65% Auxiliary::accumulate(t) 13416 19.22% EccOp::accumulate(t) 0 0.00% DatabusRead::accumulate(t) 2 0.00% PoseidonExt::accumulate(t) 77 0.11% PoseidonInt::accumulate(t) 366 0.52% Commitment contributions: Total time accounted for (ms): 4974 operation ms % sum COMMIT::wires(t) 2234 44.92% COMMIT::z_perm(t) 2316 46.56% COMMIT::databus(t) 9 0.17% COMMIT::ecc_op_wires(t) 38 0.77% COMMIT::lookup_inverses(t) 258 5.18% COMMIT::databus_inverses(t) 12 0.24% COMMIT::lookup_counts_tags(t) 108 2.16% ``` Compared to master, the notable differences are: `DeciderProvingKey(Circuit&)` was at 8043ms and now is 2635ms. `ProtogalaxyProver::prove` was 20953ms and now is 19967ms. Unclear if this is expected or not. `commit` was 7033ms and is now 6570ms.
- Loading branch information