Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Optim] Square root on Fp2 #2

Closed
mratsim opened this issue Jun 23, 2020 · 3 comments
Closed

[Optim] Square root on Fp2 #2

mratsim opened this issue Jun 23, 2020 · 3 comments

Comments

@mratsim
Copy link
Contributor

mratsim commented Jun 23, 2020

Is the addition chain to exponentiate by a^((p^2-9)/16) faster than the method mentioned in

Adj, G. and F. Rodriguez-Henriquez, "Square Root Computation over Even Extension Fields",
DOI 10.1109/TC.2013.145, pages 2829-2841, In IEEE
Transactions on Computers. vol 63 issue 11, November 2014,
https://doi.org/10.1109/TC.2013.145.
https://eprint.iacr.org/2012/685.pdf

image

In Nim, assuming we have sqrt_invsqrt which returns both the sqrt and inverse sqrt on Fp

func sqrt_if_square*(a: var QuadraticExt): SecretBool =
  ## If ``a`` is a square, compute the square root of ``a``
  ## if not, ``a`` is unmodified.
  ##
  ## The square root, if it exist is multivalued,
  ## i.e. both x² == (-x)²
  ## This procedure returns a deterministic result
  #
  # Implementation via the complex method (which confusingly does not require a complex field)
  # We make it constant-time via conditional copies

  mixin fromComplexExtension # TODO: relax this
  static: doAssert a.fromComplexExtension()

  var t1{.noInit.}, t2{.noInit.}, t3{.noInit.}: typeof(a.c0)

  t1.square(a.c0) #     a0²
  t2.square(a.c1) # - β a1² with β = 𝑖² in a complex extension field

  t1 += t2        # a0 - (-1) a1²
  result = t1.sqrt_if_square()

  t2.sum(a.c0, t1)
  t2.div2()

  t3.diff(a.c0, t1)
  t3.div2()

  let quadResidTest = t2.isSquare()
  t2.ccopy(t3, not quadResidTest)

  sqrt_invsqrt(sqrt = t1, invsqrt = t3, t2)
  a.c0.ccopy(t1, result)

  t3.div2()
  t3 *= a.c1
  a.c1.ccopy(t3, result)
@dot-asm
Copy link
Collaborator

dot-asm commented Jul 3, 2020

Just a note that feedback is appreciated and is being considered. Thanks!

@mratsim
Copy link
Contributor Author

mratsim commented Sep 3, 2020

Diving into the code here:

blst/src/exp2.c

Lines 86 to 179 in 5c41509

static limb_t sqrt_align_fp2(vec384x out, const vec384x ret,
const vec384x sqrt, const vec384x inp)
{
static const vec384x sqrt_minus_1 = { { 0 }, { ONE_MONT_P } };
static const vec384x sqrt_sqrt_minus_1 = {
/*
* "magic" number is ±2^((p-3)/4)%p, which is "1/sqrt(2)",
* in quotes because 2*"1/sqrt(2)"^2 == -1 mod p, not 1,
* but it pivots into "complex" plane nevertheless...
*/
{ TO_LIMB_T(0x3e2f585da55c9ad1), TO_LIMB_T(0x4294213d86c18183),
TO_LIMB_T(0x382844c88b623732), TO_LIMB_T(0x92ad2afd19103e18),
TO_LIMB_T(0x1d794e4fac7cf0b9), TO_LIMB_T(0x0bd592fc7d825ec8) },
{ TO_LIMB_T(0x7bcfa7a25aa30fda), TO_LIMB_T(0xdc17dec12a927e7c),
TO_LIMB_T(0x2f088dd86b4ebef1), TO_LIMB_T(0xd1ca2087da74d4a7),
TO_LIMB_T(0x2da2596696cebc1d), TO_LIMB_T(0x0e2b7eedbbfd87d2) }
};
static const vec384x sqrt_minus_sqrt_minus_1 = {
{ TO_LIMB_T(0x7bcfa7a25aa30fda), TO_LIMB_T(0xdc17dec12a927e7c),
TO_LIMB_T(0x2f088dd86b4ebef1), TO_LIMB_T(0xd1ca2087da74d4a7),
TO_LIMB_T(0x2da2596696cebc1d), TO_LIMB_T(0x0e2b7eedbbfd87d2) },
{ TO_LIMB_T(0x7bcfa7a25aa30fda), TO_LIMB_T(0xdc17dec12a927e7c),
TO_LIMB_T(0x2f088dd86b4ebef1), TO_LIMB_T(0xd1ca2087da74d4a7),
TO_LIMB_T(0x2da2596696cebc1d), TO_LIMB_T(0x0e2b7eedbbfd87d2) }
};
vec384x coeff, t0, t1;
limb_t is_sqrt, flag;
/*
* Instead of multiple trial squarings we can perform just one
* and see if the result is "rotated by multiple of 90°" in
* relation to |inp|, and "rotate" |ret| accordingly.
*/
sqr_fp2(t0, sqrt);
/* "sqrt(|inp|)"^2 = (a + b*i)^2 = (a^2-b^2) + 2ab*i */
/* (a^2-b^2) + 2ab*i == |inp| ? |ret| is spot on */
sub_fp2(t1, t0, inp);
is_sqrt = vec_is_zero(t1, sizeof(t1));
vec_copy(coeff, BLS12_381_Rx.p2, sizeof(coeff));
/* -(a^2-b^2) - 2ab*i == |inp| ? "rotate |ret| by 90°" */
add_fp2(t1, t0, inp);
vec_select(coeff, sqrt_minus_1, coeff, sizeof(coeff),
flag = vec_is_zero(t1, sizeof(t1)));
is_sqrt |= flag;
/* 2ab - (a^2-b^2)*i == |inp| ? "rotate |ret| by 135°" */
sub_fp(t1[0], t0[0], inp[1]);
add_fp(t1[1], t0[1], inp[0]);
vec_select(coeff, sqrt_sqrt_minus_1, coeff, sizeof(coeff),
flag = vec_is_zero(t1, sizeof(t1)));
is_sqrt |= flag;
/* -2ab + (a^2-b^2)*i == |inp| ? "rotate |ret| by 45°" */
add_fp(t1[0], t0[0], inp[1]);
sub_fp(t1[1], t0[1], inp[0]);
vec_select(coeff, sqrt_minus_sqrt_minus_1, coeff, sizeof(coeff),
flag = vec_is_zero(t1, sizeof(t1)));
is_sqrt |= flag;
/* actual "rotation" */
mul_fp2(out, ret, coeff);
return is_sqrt;
}
static limb_t recip_sqrt_fp2(vec384x out, const vec384x inp)
{
vec384x ret, sqrt;
recip_sqrt_fp2_9mod16(ret, inp);
mul_fp2(sqrt, ret, inp);
/*
* Now see if |ret| is or can be made 1/sqrt(|inp|)...
*/
return sqrt_align_fp2(out, ret, sqrt, inp);
}
static limb_t sqrt_fp2(vec384x out, const vec384x inp)
{
vec384x ret;
recip_sqrt_fp2_9mod16(ret, inp);
mul_fp2(ret, ret, inp);
/*
* Now see if |ret| is or can be made sqrt(|inp|)...
*/
return sqrt_align_fp2(out, ret, ret, inp);
}

We have roughly currently:

  • An Fp2 addition chain with 894 operations
  • 3 squarings/mul and a couple of cheap additions/substractions/conditional select

Adj-Hendriquez paper would require:

  1. line 2. An optional Fp addition chain for SQRT (short circuiting in non-constant time implementation)
  2. line 5. An Fp addition chain for the Legendre symbol (p-1)/2 that can be merged in the following SQRT
  3. line 9. An Fp addition chain for SQRT ~458 operations
  4. line 11. An Fp addition chain for Legendre symbol ~458 operations
  5. line 15. An Fp addition chain for SQRT ~ 458 operations
  6. line 16. An Fp addition chain for inversion, but it can be merged with the previous SQRT computation.

Hence your implementation seems to do 33% less operations on paper.
Closing the issue.

@mratsim mratsim closed this as completed Sep 3, 2020
@dot-asm
Copy link
Collaborator

dot-asm commented Dec 3, 2020

As it turns out, it's possible to omit all the checks and leave just a pair of sqrt_fp calls. This amounts to ~2.5x improvement for sqrt_fp2 itself, and >40% for hash-to-[g2-]curve. See new sqrt.c:-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants