-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast isSquare / Legendre symbol / Jacobi symbol - 16.8x acceleration #95
Fast isSquare / Legendre symbol / Jacobi symbol - 16.8x acceleration #95
Conversation
Actually speedup may not be 16.8x but only 10x, quote from original PR
|
To answer this part specifically: there should be no breakage, since PR #77 was not part of any released version of halo2curves (0.4.0 at the latest). Though that should not be taken as a sign of support for a slow release schedule on my part! |
* Bernstein yang modular multiplicative inverter (#2) * rename similar to #95 --------- Co-authored-by: Aleksei Vambol <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks again for improving the library.
Although I still don't fully understand the reason of i
being 30
, but I don't want to block the PR. Other parts could be referenced from Computing the Jacobi symbol using Bernstein-Yang and Optimized Binary GCD for Modular Inversion.
Additional small notes for perhaps other reviewers:
(b ^ b >> 1) & 2 == 2
is equal tob % 8 == 3 || b % 8 == 5
whenb
is odd.- This python version might be the most unoptmized one to understand the algorithm https://gist.github.com/robot-dreams/ceb00162b80384f2ae1913aaa2b35e75 (also quoted in Perf:
isSquare
- constant-time Jacobi/Kronecker/Legendre symbol using fast GCD mratsim/constantine#199)
Thank you for the review! The python code you mentioned is not related to my modification of the Pornin's method. It seems to be the Wuille’s Bernstein-Yang-based method of Jacobi symbol computation. It is somewhat slower than the proposed method and lacks the proof of convergence. More information about this method can be found here: https://github.com/bitcoin-core/secp256k1/blob/master/doc/safegcd_implementation.md (section 8). Regarding the number of the inner loop's iterations, which is 30 instead of 31: The initial Pornin's method for modular inversion performs 31 iterations of the inner loop and uses "approximations" of the variables, which contain the arguments at the beginning of the algorithm's execution. These "approximations" are defined in the following way. Let n be the bit length of the largest argument without leading zeros. For n > 64 the "approximation" of the argument, which equals v, is (v div 2 ^ (n - 33)) * 2 ^ 33 + (v mod 2 ^ 31), i.e. it retains the 33 high and 31 low bits of the n-bit representation of v. If n does not exceed 64, an argument and its "approximation" are equal. It is not hard to see, that before each of the 31 iterations of the inner loop the least significant bit of each "approximation" is the same as the least significant bit of the value that the corresponding variable would have had if it had been updated in the inner loop along with the "approximations". Let us call this value the "potential precise value". Before the first iteration of the inner loop an "approximation" and the corresponding "potential precise value" are equal modulo 2^31, before the second iteration they are equal at least modulo 2^30 and so on. the This allows the initial Pornin's method to make the correct decisions, which are related to parity checks. However, we need an "approximation" to be equal to the corresponding "potential precise value" at least modulo 8 before each iteration to use the properties of the modified Jacobi symbol (x / |y|). The naive approach lies in performing just 29 iterations in the inner loop. However, I decided to change the "approximation" method in the terms of the used formula: here it is (v div 2 ^ (n - 32)) * 2 ^ 32 + (v mod 2 ^ 32), i.e. it retains the 32 high and 32 low bits of the n-bit representation of v. This modification allows to perform 30 iterations instead of 29. The admissibility of this modification I have proven using the appropriately modified Pornin's theorems. PS I think I have to make a part of comments more accurate: replace "corresponding "precise" variables' values, which would be computed, if the "precise" variables were used in the inner loop instead of the "approximation" ones." with "corresponding "precise" variables' values, which would have been computed, if the "precise" variables had been updated in the inner loop along with the "approximations"." Should I do this after merging or now? PPS Among the several improvements the most practically valuable one lies in using the short-arithmetic binary Euclidean algorithm to finish the computation, when the modified Pornin method’s "approximations" become absolutely precise. |
Yeah I mean to understand Computing the Jacobi symbol using Bernstein-Yang the script might help, and from it to your optmized implementation is easier for myself, since I couldn't find other implementation taking the same strategy.
I see it now, thanks for clarifying, I think this is what I was missing.
Feel free to do so, besides the PR needs a rebase. |
70f16ce
to
33b23d4
Compare
…s#83) * Bernstein yang modular multiplicative inverter (#2) * rename similar to privacy-scaling-explorations#95 --------- Co-authored-by: Aleksei Vambol <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks so much for this work!
81a0782
* Add field conversion to/from `[u64;4]` (privacy-scaling-explorations#80) * feat: add field conversion to/from `[u64;4]` * Added conversion tests * Added `montgomery_reduce_short` for no-asm * For bn256, uses assembly conversion when asm feature is on * fix: remove conflict for asm * chore: bump rust-toolchain to 1.67.0 * Compute Legendre symbol for `hash_to_curve` (privacy-scaling-explorations#77) * Add `Legendre` trait and macro - Add Legendre macro with norm and legendre symbol computation - Add macro for automatic implementation in prime fields * Add legendre macro call for prime fields * Remove unused imports * Remove leftover * Add `is_quadratic_non_residue` for hash_to_curve * Add `legendre` function * Compute modulus separately * Substitute division for shift * Update modulus computation * Add quadratic residue check func * Add quadratic residue tests * Add hash_to_curve bench * Implement Legendre trait for all curves * Move misplaced comment * Add all curves to hash bench * fix: add suggestion for legendre_exp * fix: imports after rebase * Add simplified SWU method (privacy-scaling-explorations#81) * Fix broken link * Add simple SWU algorithm * Add simplified SWU hash_to_curve for secp256r1 * add: sswu z reference * update MAP_ID identifier Co-authored-by: Han <[email protected]> --------- Co-authored-by: Han <[email protected]> * Bring back curve algorithms for `a = 0` (privacy-scaling-explorations#82) * refactor: bring back curve algorithms for `a = 0` * fix: clippy warning * fix: Improve serialization for prime fields (privacy-scaling-explorations#85) * fix: Improve serialization for prime fields Summary: 256-bit field serialization is currently 4x u64, ie. the native format. This implements the standard of byte-serialization (corresponding to the PrimeField::{to,from}_repr), and an hex-encoded variant of that for (de)serializers that are human-readable (concretely, json). - Added a new macro `serialize_deserialize_32_byte_primefield!` for custom serialization and deserialization of 32-byte prime field in different struct (Fq, Fp, Fr) across the secp256r, bn256, and derive libraries. - Implemented the new macro for serialization and deserialization in various structs, replacing the previous `serde::{Deserialize, Serialize}` direct use. - Enhanced error checking in the custom serialization methods to ensure valid field elements. - Updated the test function in the tests/field.rs file to include JSON serialization and deserialization tests for object integrity checking. * fixup! fix: Improve serialization for prime fields --------- Co-authored-by: Carlos Pérez <[email protected]> * refactor: (De)Serialization of points using `GroupEncoding` (privacy-scaling-explorations#88) * refactor: implement (De)Serialization of points using the `GroupEncoding` trait - Updated curve point (de)serialization logic from the internal representation to the representation offered by the implementation of the `GroupEncoding` trait. * fix: add explicit json serde tests * Insert MSM and FFT code and their benchmarks. (privacy-scaling-explorations#86) * Insert MSM and FFT code and their benchmarks. Resolves taikoxyz/zkevm-circuits#150. * feedback * Add instructions * feeback * Implement feedback: Actually supply the correct arguments to `best_multiexp`. Split into `singlecore` and `multicore` benchmarks so Criterion's result caching and comparison over multiple runs makes sense. Rewrite point and scalar generation. * Use slicing and parallelism to to decrease running time. Laptop measurements: k=22: 109 sec k=16: 1 sec * Refactor msm * Refactor fft * Update module comments * Fix formatting * Implement suggestion for fixing CI * Re-export also mod `pairing` and remove flag `reexport` to alwasy re-export (privacy-scaling-explorations#93) fix: re-export also mod `pairing` and remove flag `reexport` to alwasy re-export * fix regression in privacy-scaling-explorations#93 reexport field benches aren't run (privacy-scaling-explorations#94) fix regression in privacy-scaling-explorations#93, field benches aren't run * Fast modular inverse - 9.4x acceleration (privacy-scaling-explorations#83) * Bernstein yang modular multiplicative inverter (#2) * rename similar to privacy-scaling-explorations#95 --------- Co-authored-by: Aleksei Vambol <[email protected]> * Fast isSquare / Legendre symbol / Jacobi symbol - 16.8x acceleration (privacy-scaling-explorations#95) * Derivatives of the Pornin's method (taikoxyz#3) * renaming file * make cargo fmt happy * clarifications from privacy-scaling-explorations#95 (comment) [skip ci] * Formatting and slightly changing a comment --------- Co-authored-by: Aleksei Vambol <[email protected]> * chore: delete bernsteinyang module (replaced by ff_inverse) * Bump version to 0.4.1 --------- Co-authored-by: David Nevado <[email protected]> Co-authored-by: Han <[email protected]> Co-authored-by: François Garillot <[email protected]> Co-authored-by: Carlos Pérez <[email protected]> Co-authored-by: einar-taiko <[email protected]> Co-authored-by: Mamy Ratsimbazafy <[email protected]> Co-authored-by: Aleksei Vambol <[email protected]>
…s#83) * Bernstein yang modular multiplicative inverter (#2) * rename similar to privacy-scaling-explorations#95 --------- Co-authored-by: Aleksei Vambol <[email protected]>
…rivacy-scaling-explorations#95) * Derivatives of the Pornin's method (taikoxyz#3) * renaming file * make cargo fmt happy * clarifications from privacy-scaling-explorations#95 (comment) [skip ci] * Formatting and slightly changing a comment --------- Co-authored-by: Aleksei Vambol <[email protected]>
Upstreaming on behalf of @AlekseiVambol - original: taikoxyz#3
This implements optimized isSquare/legendre symbol as suggested in #73 (comment).
Performance
The performance improvement is an incredible 16.8x acceleration over exponentiation by
(p-1)/2
.API direction
At the moment the procedure is named
jacobi
.PR #77 introduced the
Legendre
trait.(p-1)/2
needed to abstract away the usage of curve-specific modulus constants.In that case, the trait can be deleted, it should be recent enough that breakage is minimal if any (cc @huitseeker).
legendre
procedure can calljacobi
instead and theLegendre
trait can be preserved. Though I'm curious what's the use-case of a trait instead of a function in that case.Implementation
After extensive testing between Bernstein-Yang and Pornin's algorithms,m it turns out that:
References:
isSquare
- constant-time Jacobi/Kronecker/Legendre symbol using fast GCD mratsim/constantine#199