-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test vectors & benchmark for PCS #222
Conversation
* refactor: removed ROConstantsTrait - trait refactored to Default in associated type, - obviates the need for a separate trait, - refactored call sites and only implementation * fix: remove uneeded Sized bounds * fix: rename VanillaRO -> NativeRO
2ca3654
to
fb570c5
Compare
I think that I've got the bulk of the logic down. A sample of what the benches look like can be found here: https://phnmlt.csb.app/ Now, I would like to iterate on what we actually display on the benchmark and how it should be formatted. Currently, the results are divided in two groups: proving and verifying. They are displayed per the power of two used to generate the polynomial we are testing over. I'm not sure this fully corresponds to what we wanted. I started to look into the Throughput measurements to capture proof size but I'm not sure on how to capture a proof size in a generic manner. |
7e1862b
to
9fea337
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly looks great, I left some inline comments for simplification.
A rebase would be great, as it would allow us to see improvements from #216.
src/spartan/polys/multilinear.rs
Outdated
let poly = Self::new( | ||
std::iter::from_fn(|| Some(Scalar::random(&mut rng))) | ||
.take(1 << num_vars) | ||
.collect(), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use random
to avoid code duplication.
benches/pcs.rs
Outdated
} | ||
} | ||
|
||
fn deterministic_assets<E: Engine, EE: EvaluationEngineTrait<E>, R: CryptoRng + RngCore>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name throws me off: if you have an &mut rng
as an argument, are you creating something deterministic?
(I notice you are using this with a seeded rng, but the determinism isn't an invariant upheld by your function, rather by your usage of it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that the naming here is off. I specified it at deterministic...
because if was a pre-determined polynomial size used to generate the assets. I've the implementation to be over BenchAssets
and the name of the method to from_num_vars
. should be more explicit this way.
src/spartan/polys/multilinear.rs
Outdated
@@ -73,6 +77,27 @@ impl<Scalar: PrimeField> MultilinearPolynomial<Scalar> { | |||
) | |||
} | |||
|
|||
/// Returns a random polynomial, a point and calculate its evaluation. | |||
pub fn random_with_eval<R: RngCore + CryptoRng>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use case for generating a random polynomial, a random point, and the corresponding evaluation is pretty narrow. So it may be quite confusing to see this in the public API.
Do we need this somewhere not in benchmark/testing code, or can we just move it to benchmark/testing code where it's needed?
If we need this both in benchmarks and testing, can we just inline this short function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was actually an update on my code based on a discussion with @adr1anh that pointed out to me that if we had a random
we could have a random_with_eval
.
What you're saying about it being a test function is true, and it is also the same for random
. I could remove both from the public API and instead inline their code where we have some usage.
Would that be alright with you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use case for random
is a bit larger than random_with_eval
, so I'd tend to leave random
, and move random_with_eval
. I think my overall point is that as long as we don't add things to the public API of Arecibo, any decision we take is a 2-way door.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved random_with_eval
only to eval :)
benches/pcs.rs
Outdated
|
||
// Macro to generate benchmark code for multiple engine and evaluation engine types | ||
macro_rules! benchmark_engines { | ||
($ell:expr, $rng:expr, $group:expr, $internal_fn:expr, $( ($engine:ty, $eval_engine:ty, $engine_name:expr) ),*) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use type_name!
to avoid the need for the last parameter
benches/pcs.rs
Outdated
macro_rules! benchmark_engines { | ||
($ell:expr, $rng:expr, $group:expr, $internal_fn:expr, $( ($engine:ty, $eval_engine:ty, $engine_name:expr) ),*) => { | ||
$( | ||
let mut assets = deterministic_assets::<$engine, $eval_engine, StdRng>($ell, &mut $rng); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you populate the eval_engine
, type inference should populate the first parameter. IOW, this should work and remove the need for the engine
param:
deterministic_assets::<_, $eval_engine, StdRng>($ell, &mut $rng);
src/provider/util/mod.rs
Outdated
|
||
/// Generates a random polynomial and point from a seed to test a proving/verifying flow of one | ||
/// of our EvaluationEngine over a given Engine. | ||
pub(crate) fn prove_verify_from_ell<E: Engine, EE: EvaluationEngineTrait<E>>(ell: usize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function looks great, though I'm not sure about its name: ell isn't a standard nomenclature.
Here are things that would help me grok what this function is for out of context, and that you could use to edit the comment, the name of this function, or more likely both :
- make it clear we're testing the prove and verify flow of Multilinear Polynomial Commitment Schemes (PCS),
- which in Nova are represented by
EvaluationEngineTrait
implementations for proving, - replacing
ell
by the more evocativenum_var
, which becomes clear once the first bullet point has been stated clearly, - avoid the mention of an
Engine
, which is implied and selected by a choice ofEvaluationEngineTrait
implementation,
}, | ||
) | ||
.unzip(); | ||
.zip_eq(offsets_of_x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird that cargo fmt
didn't complain with the original indentation!?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes pretty strange, I kept the update as it seemed ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue that only pops off the page now: at large values of ell, the generation of a proof is expensive, as you've no doubt noticed. Nonetheless, the output is completely deterministic.
One bad thing that's obscured by the benchmark_engines
macro is that you're generating assets for a specific size (and EE choice), benching the proving, then re-generating those assets for the verification, and finally benching the verification. The assets could and should be shared between proving and verification I think.
benches/pcs.rs
Outdated
|
||
criterion_main!(pcs); | ||
|
||
const TEST_ELL: [usize; 11] = [10, 11, 12, 23, 14, 15, 16, 17, 18, 19, 20]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need all points in the [10-20] interval. [10, 12, 14 ,16, 18, 20] seem amply sufficient to me.
/cc @adr1anh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the protocols we are benchmarking are linear in ell
, I think it's fair to sample fewer iterations as we should be able to interpolate the results to the odd sizes.
benches/pcs.rs
Outdated
|
||
criterion_main!(pcs); | ||
|
||
const TEST_ELL: [usize; 11] = [10, 11, 12, 23, 14, 15, 16, 17, 18, 19, 20]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the protocols we are benchmarking are linear in ell
, I think it's fair to sample fewer iterations as we should be able to interpolate the results to the odd sizes.
if evaluate_bad_proof { | ||
// Generate another point to verify proof. Also produce eval. | ||
let altered_verifier_point = point | ||
.iter() | ||
.map(|s| s.add(<E as Engine>::Scalar::ONE)) | ||
.collect::<Vec<_>>(); | ||
let altered_verifier_eval = | ||
MultilinearPolynomial::evaluate_with(poly.evaluations(), &altered_verifier_point); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of running the verification twice, you can do something like
let point = if evaluate_bad_proof {
altered_verifier_point
} else {
point
}
// Verify proof, should fail.
let mut verifier_transcript = E::TE::new(b"TestEval");
let res = EE::verify(
&verifier_key,
&mut verifier_transcript,
commitment,
&altered_verifier_point,
&altered_verifier_eval,
&proof,
);
assert_eq!(res.is_err(), evaluate_bad_proof);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, however this would also change how the method behave. Currently if evaluate_bad_proof
is set to true
it allows the method to test both a valid and a non valid proof. While your proposal makes it that I only test one of the two cases. The downside is that I would then have to call this helper method twice for unit tests, doubling the number of prove
method call instead.
Even after this comment, if you still think I should go with your proposal I'll be happy to do it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Obviously, it is hard to make some statements about performance without running benchmarks on variety of ELLs but I would may be consider specifying the ELL
as a single digit via a command-line which can be useful while running single instance of the benchmark on developer's machine in order to get very rough information whether performance has been increased / decreased / left unchanged after some PR merged.
benches/pcs.rs
Outdated
{ | ||
let mut proving_group = c.benchmark_group("PCS-Proving"); | ||
proving_group | ||
.sampling_mode(SamplingMode::Flat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, Flat
is only suitable to use for very long-running benchmarks. Probably Auto
can be more appropriate for the short-running ones, where ell
is say 10 / 11 / 12. Does it make sense to split the benchmarks into short / long running categories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny enough: in general the Flat
sampling mode is fine for most cases ... except I looked at the sample distribution for this PR, and the standard deviation for ZM verification at ell = 10 doesn't look great. You're probably right @storojs72, good catch!
Though I would note one very simple way of addressing this is noting the bench runtimes, which are reasonable (<20s per ell setting, see the bench log on my M1 Mac at https://gist.github.com/huitseeker/015f07534f410e94d615684ecd86876f) re-run them without setting the sampling mode (defaulting to Auto
), and see if the runtimes are still reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we have removed all odds sizes we are only left with 10 and 12 from your example. I'm not sure that there is a need to split the code for 2 cases.
I would either make it Auto
for all (might not be great for higher-end number of variables) or Flat
(losing a bit of iterations on smaller number of variables but we should still get enough to have convincing results).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm explicitly suggesting Auto for all, and then revisiting the question if the overall bench runtime is larger than say 15 min.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why I couldn't see your comment when I put mine up. I'll update to Auto
benches/pcs.rs
Outdated
) | ||
.unwrap(); | ||
|
||
BenchAssests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BenchAssets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assets used during our benchmark that are shared between the proving and verification steps. Useful to avoid having lengthy methods signatures.
A way to do this is to have the ELL array (currently a constant) be overridden by an environment variable. We can probably open an issue and tackle this after merging this PR. |
61a25a5
to
1293d9b
Compare
test(ipa): proof verified test(ipa): switch from keccak to grumpkin transcript test(ipa): wip test IPA prove & verify chore: added Jetbrain config to gitignore
- Implemented test_fail_bad_proof that generates a test case from a seed to try and verify a non valid proof. - Implement both test_fail_bad_proof & test_from_seed for IPA over Grumpkin, MLKZG over Bn256 and ZM over Bn256
- Created a benchmark for proof generation and verification for IPA over Grumpkin, MLKZG over Bn and ZM over Bn - multilinear.rs to public for benchmark purposes - Refactored generic test methods per @adr1anh comment
181cb54
to
452f95c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've re-run the benches on the same machine and the end-to-end runtime stands at 00:16:30, which I deem acceptable.
This looks great @tchataigner, thank you so much!
See lurk-lang#222, the commit_open_prove_verify test is subsumed by the subsequent unit test. @tchataigner was right from the start.
See lurk-lang#222, the commit_open_prove_verify test is subsumed by the subsequent unit test. @tchataigner was right from the start.
See lurk-lang#222, the commit_open_prove_verify test is subsumed by the subsequent unit test. @tchataigner was right from the start.
See lurk-lang#222, the commit_open_prove_verify test is subsumed by the subsequent unit test. @tchataigner was right from the start.
See #222, the commit_open_prove_verify test is subsumed by the subsequent unit test. @tchataigner was right from the start.
Goal of this PR
The goal of this PR is to implement test vectors & benchmark for PCS.
Current progress
Implemented two generic test functions that we can use over any
<Engine, EvaluationEngine<Engine>>
for both the correct prove/verify flow and a bad proof verification flow.Left TODO
Related issue
This is part of tackling the issue #212