Endomorphism acceleration for Scalar Multiplication #44

mratsim · 2020-06-14T13:00:17Z

This PR introduces constant-time GLV (Gallant-Lambert-Vanstone) acceleration for scalar multiplication.

Implementation follow closely Algorithm 1 and 2 from paper

Efficient and Secure Algorithms for GLV-Based Scalar
Multiplication and their Implementation on GLV-GLS
Curves (Extended Version)
Armando Faz-Hernández, Patrick Longa, Ana H. Sánchez, 2013
https://eprint.iacr.org/2013/158.pdf

The Lattice decompositions for BN and BLS curve families follow chapter 6 from

Guide to Pairing-based Cryptography
Chapter 6: Scalar Multiplication and Exponentiation in Pairing Groups
Joppe Bos, Craig Costello, Michael Naehrig

Performance

20% speed increase over constant-time scalar multiplication with window of size 4
Stack space usage divided by 8 (only 2 items in the scratchspace compared to 16)

vs MCL

As measured in status-im/nim-blscurve#47 we are now 2x slower on x86
This can be explained by 2 things:

Usage of faster incomplete addition formulas in MCL. MCL uses jacobian coordinates we need to special case when adding the same point or its opposite or when adding an infinity points, the incomplete formula are also significantly faster (1.4x ratio)
Usage of MULX/ADCX/ADOX in field multiplication on x86. Using dedicated instructions for large integer arithmetic has a compounding effect on elliptic curve operations.

vs Milagro

As measured in status-im/nim-blscurve#47 we are 3x faster than Milagro.
Note that we use the same complete projective formulas, however Constantine is constant-time from the ground up while Milagro has some non-constant-time field operations. Also Constantine uses uint128 where possible and is implemented with carry while Milagro uses lazy reductions.

PR future improvements

The acceleration is hardcoded for BN254 and BLS12-381 at the moment, several improvements are needed in subsequent PR:

Tests are hardcoded, there is no fuzzing/property-based testing as the endomorphisms require to be on the proper subgroup of the curve and so we need to clear the cofactor of the random generated points. Clearing the cofactor should be compatible with hash-to-curve IETF draft (https://tools.ietf.org/html/draft-irtf-cfrg-hash-to-curve-08#section-7) in particular for BLS. Scalar multiplication by 1-u is suitable (Wahby, Boneh, 2019, Fast and simple constant-time hashingto the BLS12-381 elliptic curve, https://eprint.iacr.org/2019/403.pdf)
The lattice decomposition should go in a curves_family or similar file in config
The lattice decomposition could be cleanup with support for negative bigint. Support can be added by having type SignedBigInt[bits] = BigInt[bits+1]
The sage scripts are somehow not working in the very last assert: https://github.com/mratsim/constantine/blob/endomorphism-accel/sage/lattice_decomposition_bn254_snarks_g1.sage#L132-L178 (but the implementation does work so I expect some bithacks trouble)

The recoding can be significantly cleaned up by adopting the recommendation in the paper in p6

in particular, using 2 bits for encoding {-1, 0, 1} is unnecesary

constantine/constantine/elliptic/ec_endomorphism_accel.nim

Lines 108 to 131 in e9e84ab

    
           type 
        
             SignExtender = object 
        
               ## Uses C builtin types sign extension to sign extend 2-bit to 8-bit 
        
               ## in a portable way as sign extension is automatic for builtin types 
        
               ## http://graphics.stanford.edu/~seander/bithacks.html#FixedSignExtend 
        
               digit {.bitsize:2.}: int8 
        
           # TODO: use unsigned to avoid checks and potentially leaking secrets 
        
           #       or push checks off (or prove that checks can be elided once Nim has Z3 in the compiler) 
        
           proc `[]`(recoding: Recoded, 
        
                     digitIdx: int): int8 {.inline.}= 
        
             ## 0 <= digitIdx < LengthInDigits 
        
             ## returns digit ∈ {0, 1, −1} 
        
             const len = Recoded.LengthInDigits 
        
             assert digitIdx < len 
        
             let slot = distinctBase(recoding)[ 
        
               len-1 - (digitIdx shr Shift) 
        
             ] 
        
             let recoded = slot shr (BitSize*(digitIdx and ByteMask)) and DigitMask 
        
             var signExtender: SignExtender 
        
             # Hack with C assignment that return values 
        
             {.emit: [result, " = ", signExtender, ".digit = ", recoded, ";"].} 
        
             # " # Fix highlighting bug in VScode

, this is because the sign is already encoded in the first miniscalar so we only need {0, 1} dividing memory cost by 2

The previous point would allow the library to use uint everywhere and avoids silently adding overflow checks which might leak secret data
We have some buffers that don't need to be zero-initialized
The lookup table can use simultaneous inversions so that we can use the fast mixed Projective-Affine formulas within the scalar multiplication loop
On G1, with a 2 dimensional decomposition, the lookup table is small (2 curve points), we can use a window of 2 or 3 (especially with affine coordinates) with the following estimated speedups:
- GLV scalarmul on 254-bit scalar --> 127 doubling + 127 additions (from table lookup)
- With window of size 2 --> 127 doublings + 64 additions (-25% operations)
- With window of size 3 --> 127 doublings + 43 additions (-33% operations)

…V-Based Scalar Multiplication" by Faz et al

…und upstream bug nim-lang/Nim#14585

… additions

…earing of the point)

mratsim added 18 commits June 7, 2020 19:40

Add MultiScalar recoding from "Efficient and Secure Algorithms for GL…

99a4d8c

…V-Based Scalar Multiplication" by Faz et al

precompute cube root of unity - Add VM precomputation of Fp - workaro…

eff8823

…und upstream bug nim-lang/Nim#14585

Add the φ-accelerated lookup table builder

5925e98

Add a dedicated bithacks file

2213d1b

cosmetic import consistency

f4c4682

Build the φ precompute table with n-1 EC additions instead of 2^(n-1)…

cb7ab3e

… additions

remove binary

3a43f11

Add the GLV precomputations to the sage scripts

082edaa

You can't avoid it, bigint multiplication is needed at one point

fbcf219

Add bigint multiplication discarding some low words

2e47d37

Implement the lattice decomposition in sage

50df2f5

Proper decomposition for BN254

4862f23

Prepare the code for a new scalar mul

b84657f

We compile, and now debugging hunt

3500bbe

More helpers to debug GLV scalar Mul

09410f6

Fix conditional negation

4ecd126

Endomorphism accelerated scalar mul working for BN254 curve

10ded15

Implement endomorphism acceleration for BLS12-381 (needed cofactor cl…

e9e84ab

…earing of the point)

mratsim added constant time ⏳ Enhancement is suitable for secret data performance 🏁 labels Jun 14, 2020

fix nimble test script after bench rename

1ad1d4f

mratsim merged commit 2613356 into master Jun 14, 2020

mratsim deleted the endomorphism-accel branch June 14, 2020 14:57

mratsim mentioned this pull request Jun 23, 2020

[Optim] Accelerated scalar multiplication supranational/blst#1

Closed

mratsim mentioned this pull request Aug 31, 2020

Frobenius endomorphism ψ = φ−1 πp φ (psi = untwist-Frobenius-Twist) #78

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Endomorphism acceleration for Scalar Multiplication #44

Endomorphism acceleration for Scalar Multiplication #44

mratsim commented Jun 14, 2020 •

edited

Loading

	type
	SignExtender = object
	## Uses C builtin types sign extension to sign extend 2-bit to 8-bit
	## in a portable way as sign extension is automatic for builtin types
	## http://graphics.stanford.edu/~seander/bithacks.html#FixedSignExtend
	digit {.bitsize:2.}: int8

	# TODO: use unsigned to avoid checks and potentially leaking secrets
	# or push checks off (or prove that checks can be elided once Nim has Z3 in the compiler)
	proc `[]`(recoding: Recoded,
	digitIdx: int): int8 {.inline.}=
	## 0 <= digitIdx < LengthInDigits
	## returns digit ∈ {0, 1, −1}
	const len = Recoded.LengthInDigits
	assert digitIdx < len

	let slot = distinctBase(recoding)[
	len-1 - (digitIdx shr Shift)
	]
	let recoded = slot shr (BitSize*(digitIdx and ByteMask)) and DigitMask
	var signExtender: SignExtender
	# Hack with C assignment that return values
	{.emit: [result, " = ", signExtender, ".digit = ", recoded, ";"].}
	# " # Fix highlighting bug in VScode

Endomorphism acceleration for Scalar Multiplication #44

Endomorphism acceleration for Scalar Multiplication #44

Conversation

mratsim commented Jun 14, 2020 • edited Loading

Performance

vs MCL

vs Milagro

PR future improvements

mratsim commented Jun 14, 2020 •

edited

Loading