Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BN254-Snarks: bad performance on Fp4 squaring #154

Closed
mratsim opened this issue Feb 7, 2021 · 0 comments · Fixed by #155
Closed

BN254-Snarks: bad performance on Fp4 squaring #154

mratsim opened this issue Feb 7, 2021 · 0 comments · Fixed by #155
Labels
bug 🪲 Something isn't working performance 🏁

Comments

@mratsim
Copy link
Owner

mratsim commented Feb 7, 2021

For some reason (multiplication by its super large 9+i non-residue?)
BN254-Snarks is very slow on Fp4

image

Suspect 1:

Suspect 2:

The general squaring in quadratic field:

func square_generic(r: var QuadraticExt, a: QuadraticExt) =
## Return a² in ``r``
## ``r`` is initialized/overwritten
# Algorithm (with β the non-residue in the base field)
#
# (c0, c1)² => (c0 + c1 w)²
# => c0² + 2 c0 c1 w + c1²w²
# => c0² + β c1² + 2 c0 c1 w
# => (c0² + β c1², 2 c0 c1)
# We have 2 squarings and 1 multiplication in the base field
# which are significantly more costly than additions.
# For example when construction 𝔽p12 from 𝔽p6:
# - 4 limbs like BN254: multiplication is 20x slower than addition/substraction
# - 6 limbs like BLS12-381: multiplication is 28x slower than addition/substraction
#
# We can save operations with one of the following expressions
# of c0² + β c1² and noticing that c0c1 is already computed for the "y" coordinate
#
# Alternative 1:
# c0² + β c1² <=> (c0 - c1)(c0 - β c1) + β c0c1 + c0c1
#
# Alternative 2:
# c0² + β c1² <=> (c0 + c1)(c0 + β c1) - β c0c1 - c0c1
mixin prod
var v0 {.noInit.}, v1 {.noInit.}: typeof(r.c0)
# v1 <- (c0 + β c1)
v1.prod(a.c1, NonResidue)
v1 += a.c0
# v0 <- (c0 + c1)(c0 + β c1)
v0.sum(a.c0, a.c1)
v0 *= v1
# v1 <- c0 c1
v1.prod(a.c0, a.c1)
# aliasing: a unneeded now
# r0 = (c0 + c1)(c0 + β c1) - c0c1
v0 -= v1
# r1 = 2 c0c1
r.c1.double(v1)
# r0 = (c0 + c1)(c0 + β c1) - c0c1 - β c0c1
v1 *= NonResidue
r.c0.diff(v0, v1)

The basic expression is (c0² + β c1², 2 c0 c1)

We can either minimize Mul/Squarings, requiring only 2 multiplications by rewriting to:

  • r0 = (c0 + c1)(c0 + β c1) - c0c1 - β c0c1
  • r1 = 2 c0c1

or only use squarings by rewritting to

  • r0 = a0² + β a1²
  • r1 = (a0 + a1)² - a0² - a1²

The first expression requires 2 multiplication by non-residue.

@mratsim mratsim added bug 🪲 Something isn't working performance 🏁 labels Feb 7, 2021
mratsim added a commit that referenced this issue Feb 7, 2021
mratsim added a commit that referenced this issue Feb 8, 2021
@mratsim mratsim mentioned this issue Feb 8, 2021
9 tasks
mratsim added a commit that referenced this issue Feb 9, 2021
mratsim added a commit that referenced this issue Feb 9, 2021
mratsim added a commit that referenced this issue Feb 9, 2021
* consistent naming for dbl-width

* Isolate double-width Fp2 mul

* Implement double-width complex multiplication

* Lay out Fp4 double-width mul

* Off by p in square Fp4 as well :/

* less copies and stack space in addition chains

* Address #154 partly

* Fix #154, faster Fp4 square: less non-residue, no Mul, only square (bit more ops total)

* Fix typo

* better assembly scheduling for add/sub

* Double-width -> Double-precision

* Unred -> Unr

* double-precision modular addition

* Replace canUseNoCarryMontyMul and canUseNoCarryMontySquare by getSpareBits

* Complete the double-precision implementation

* Use double-precision path for Fp4 squaring and mul

* remove mixin annotations

* Lazy reduction in Fp4 prod

* Fix assembly for sum2xMod

* Assembly for double-precision negation

* reduce white spaces in pairing benchmarks

* ADX implies BMI2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working performance 🏁
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant