-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP optimizations #1
base: main
Are you sure you want to change the base?
Conversation
Change multiopen commitment scheme to KZG
…-unused-ci rm unused CI checks
Hi @Brechtpd |
Hey @noctrlz, how's your assembly implementation in privacy-scaling-explorations/pairing#2 going? If that's ready to be used I would very much like it to be in this branch as soon as possible because I think that'll probably make a big difference! :) Because the current field operations are pretty slow any field operation saved makes a big difference, but that may not be the case anymore when they are well optimized so this may impact what is worth doing and what is not. For example with the FFT radix-4 is currently never faster while in my libsnark implementation it was a bit faster to use it when possible. I think the FFT implementation could use some improvements, but I have some misc ideas for that already and wanted to wait for the assembly field optimizations to be ready before looking into it further. One potentially interesting optimization one I have not looked at at all is removing zero knowledge from the prover. Because of my very limited knowledge about PLONK I don't know if this will save a lot of operations or not. I know there are "blinding factors" and some extra calculations in the lookup tables but no idea what kind of other things that could be removed. Do you have an idea or can you help finding this out? Another important one is still the reusing the intermediate results one for the h() polynomial. You can see in I may be forgetting some things right now so I'll update if I think of others. |
All assembly arithmetic was completed and the original author of
This is interesting idea 👍
Okay! |
Ah nice! Seems like the review is pending for quite some time now, do you think it's worth it to just get it in as is on this branch for some testing?
Hmmm not sure I understand. Looking at the halo2 docs for lookup it seems like having zero knowledge is only a small adjustment to the main lookup algorithm. Why would e.g. not doing this adjustment make it impossible to use lookups? EDIT: I think if we remove the zk stuff from the lookup calculations we can lower the degree by one, which could be pretty important because this would allow us to get a circuit with an extended domain of only 2x the normal domain (we currently have an extended domain of x16). With the zk calculations the lowest we would be able to get is 4x. |
Exactly.
Yeah it seems. |
Hi @Brechtpd
I am going to integrate it to FFT. |
Awesome! Eager to see how it behaves. :)
I guess the only really important one is the parallelization one so the FFT code works well on all CPUs, the other ones are only minor possible optimizations. |
Some basic testing shows the most heavy arithmetic steps around ~30% faster, overall prover time decreased 20-25% without doing any other changes that may make better use of the faster field ops now. :) One thing to think about is that the assembly code uses |
It also looks like the FFT will be much less important after doing some circuit changes, I currently think the multi exps will be the most important part to optimize. So I would probably hold on off on doing the smaller FFT optimizations until it's clear they actually would make a decent difference. |
I am benching as well on privacy-scaling-explorations/zkevm-circuits#302.
Thank you for the review!
Okay. |
And it seems we should rebase upstream halo2 and it's breaking changes. |
Not much different than the standard bench code, I just modified it a little so the test circuit actually does opcodes instead of everything empty, but I don't think that really changes things currently (I did it more as a precaution). |
Hi @Brechtpd I am going to bench prove function as well. |
Reduce memory overhead of MSM
Contains:
h()
evaluation much more efficiently.Roughly 4x faster using 8x less memory for the current zkEVM circuit.
In general only doing high level optimizations here.
Next steps:
product_coset
/permuted_input_cosets
/permuted_table_cosets
. The calculations aren't ideal so not sure if it's possible to not have a table/lookup expr at all. EDIT: Done, but in an unsatisfying way. I believe some more savings are possible, but may be a bit messier.h()
evaluation it's very important to be able to reuse intermediate results. Unfortunately the rust compiler doesn't seem to do this well for us (I assume because it's much harder because the calculations are field operations). Current algorithm that's used is pretty naive so I expect better results to be possible by having something smarter. Ideally could let something like LLVM do this optimization.Also more things described at privacy-scaling-explorations#15 (comment)