-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bernstein yang modular multiplicative inverter #2
Bernstein yang modular multiplicative inverter #2
Conversation
Hello, aleksei, I found your code has not been formatted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only comments to improve in general.
A good addition would be an inversion benchmark here https://github.com/taikoxyz/halo2curves/blob/main/benches/bn256_field.rs
struct ChunkInt<const B:usize, const L:usize>(pub [u64; L]); | ||
|
||
impl<const B:usize, const L:usize> ChunkInt<B,L> { | ||
/// Mask, in which the B lowest bits are 1 and only they |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"and only they"
Seems like the sentence is cut.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it means "they and only they are 1".
src/bernsteinyang.rs
Outdated
/// The ordering of the chunks in these arrays is little-endian. | ||
/// The arithmetic operations for this type are wrapping ones. | ||
#[derive(Clone)] | ||
struct ChunkInt<const B:usize, const L:usize>(pub [u64; L]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's usually called unsaturated integers see:
- golang/go@b0c49ae
an unsaturated 51-bit limb field implementation optimized for 64-bit
architectures and math/bits.Mul64 intrinsics - https://github.com/mit-plv/fiat-crypto/tree/42e5455a3f95ee4f739245e75b667e0639beac51#usage-generating-c-files
There is a separate compiler binary for each implementation strategy:
- saturated_solinas
- unsaturated_solinas
- word_by_word_montgomery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this term is rarely used. On the other hand, "saturated integer" is usually used to describe the type, for which the saturation arithmetic is implemented: https://en.wikipedia.org/wiki/Saturation_arithmetic A least, my first thought was about the saturation arithmetic, when I saw this term somewhere before. Thus, I will leave this comment "as is" to avoid the ambiguation.
data[i - 1] = self.0[i]; | ||
} | ||
if self.is_negative() { | ||
data[L - 1] = Self::MASK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the mask in data[L - 2]
need to be cancelled?
A comment is needed, at least to attract attention for a potential bug during review/audit and also test input design.
Edit: Ah I understand the representation now. All limbs are 2-complement and negated, not just the most significant word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/bernsteinyang.rs
Outdated
let (mut data, mut carry) = ([0; L], 0); | ||
for i in 0..L { | ||
let sum = self.0[i] + other.0[i] + carry; | ||
data[i] = sum & ChunkInt::<B,L>::MASK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the sum is negative? You remove the negative tag there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of "sum" is never negative, since actually I operate on 2 non-negative integers in [0 .. 2 ^ (B * L) - 1] splited into B-bit non-negative chunks, but any x in [2 ^ (B * L - 1) .. 2 ^ (B * L) - 1] is considered to be the representation of -|2 ^ (B * L) - x|. The Mul, Add and Sub algorithms for the two's complement code do not care about the sign, and this is the main reason for them being used by the majority of processors. Since ChunkInt has a fixed size, it may afford to use this convenient code.
impl<const B:usize, const L:usize> Sub for &ChunkInt<B,L> { | ||
type Output = ChunkInt<B,L>; | ||
fn sub(self, other: Self) -> Self::Output { | ||
let (mut data, mut carry) = ([0; L], 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a comment to explain that
-x = flip the bits of x and add 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. This part of code has not been commented yet.
src/bernsteinyang.rs
Outdated
} | ||
|
||
let mask = (1 << steps.min(1 - delta).min(4)) - 1; | ||
let w = (g as i64).wrapping_mul(f.wrapping_mul(3) ^ 12) & mask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a comment in the vein of
Find the multiple of f to add to cancel the bottom min(steps, 4) bits of g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. This part of code has not been commented yet; "f.wrapping_mul(3) ^ 12" will also been explained a bit.
src/bernsteinyang.rs
Outdated
(cd.shift(), ce.shift()) | ||
} | ||
|
||
fn norm(&self, mut value: ChunkInt<B,L>, negate: bool) -> ChunkInt<B,L> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should indicate the input range and output range
AFAIK:
Compute a = sign*a (mod M)
with a in range (-2*M, M)
result in range [0, M)
also why is the value mutated and returned as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I was going to comment this part of the code.
also why is the value mutated and returned as well?
It is mutable in order to mutate it within the function without introducing a new temporary variable.
It is returned, because the function takes the ownership of the argument instead of borrowing it mutably.
It is returned instead of being changed using a mutable reference because I want all function to be pure: https://en.wikipedia.org/wiki/Pure_function
src/bernsteinyang.rs
Outdated
} | ||
|
||
/// Returns the multiplicative inverse of the argument modulo 2^B. The implementation is based | ||
/// on the Hurchalla's method for computing the multiplicative inverse modulo a power of two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing link to paper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually I try to avoid overwhelming my code with direct links to papers easy to find, since I try to achieve better readability and aesthetics) However, I can add this one.
src/derive/field.rs
Outdated
@@ -24,6 +24,11 @@ macro_rules! field_common { | |||
$r2:ident, | |||
$r3:ident | |||
) => { | |||
/// Bernstein-Yang modular multiplicative inverter created for the modulus equal to | |||
/// the characteristic of the field to invert positive integers in the Montgomery form. | |||
const BYINVERTOR: crate::bernsteinyang::BYInverter<62,6> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for BN254 and secp256k1, BYInverter<62,5>
is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I have checked this both experimentally (running the code for the bn254 Fq modulus with BYInverter<62,5>) and theoretically (deriving the formula 2 ^ (B * (L - 1) - 2) for the threshold for the values of the modulus and the argument).
let mut matrix; | ||
while g != ChunkInt::ZERO { | ||
(delta, matrix) = Self::step(&f, &g, delta); | ||
(f, g) = Self::fg(f, g, matrix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment: fg
updates can take a parameter "limbsLeft" to avoid iterating on the full L
all the time.
This might be left as a future optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this during planing the implementation, but decided not to do this, because:
- It will require some extra time to determine the number of unused limbs after each arithmetic operation;
- Only the f,g-update can benefit from it;
- For the two's complement code working with fixed-length register-like structure is much more convenient.
For much larger modulus this optimization makes sense, but I am not sure that it is needed here.
"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." - Donald Knuth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, I think the src filename should be "invmod.rs" or "modinv.rs" instead of Bernstein-Yang (i.e. what it does instead of how it does it), but the PR is in good shape and it will likely take a very long-time to review for the PSE team so better merge now and address their concerns.
…s#83) * Bernstein yang modular multiplicative inverter (#2) * rename similar to privacy-scaling-explorations#95 --------- Co-authored-by: Aleksei Vambol <[email protected]>
…s#83) * Bernstein yang modular multiplicative inverter (#2) * rename similar to privacy-scaling-explorations#95 --------- Co-authored-by: Aleksei Vambol <[email protected]>
Replacing the multiplicative inversion based on the Little Fermat Theorem with Bernstein yang multiplicative inversion. The latter one turned out to be about 8.5 times faster on average.