Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endomorphism acceleration for Scalar Multiplication #44

Merged
merged 19 commits into from
Jun 14, 2020

Conversation

mratsim
Copy link
Owner

@mratsim mratsim commented Jun 14, 2020

This PR introduces constant-time GLV (Gallant-Lambert-Vanstone) acceleration for scalar multiplication.

Implementation follow closely Algorithm 1 and 2 from paper

  • Efficient and Secure Algorithms for GLV-Based Scalar
    Multiplication and their Implementation on GLV-GLS
    Curves (Extended Version)
    Armando Faz-Hernández, Patrick Longa, Ana H. Sánchez, 2013
    https://eprint.iacr.org/2013/158.pdf

The Lattice decompositions for BN and BLS curve families follow chapter 6 from

  • Guide to Pairing-based Cryptography
    Chapter 6: Scalar Multiplication and Exponentiation in Pairing Groups
    Joppe Bos, Craig Costello, Michael Naehrig

Performance

  • 20% speed increase over constant-time scalar multiplication with window of size 4
  • Stack space usage divided by 8 (only 2 items in the scratchspace compared to 16)

image

vs MCL

As measured in status-im/nim-blscurve#47 we are now 2x slower on x86
This can be explained by 2 things:

  1. Usage of faster incomplete addition formulas in MCL. MCL uses jacobian coordinates we need to special case when adding the same point or its opposite or when adding an infinity points, the incomplete formula are also significantly faster (1.4x ratio)
  2. Usage of MULX/ADCX/ADOX in field multiplication on x86. Using dedicated instructions for large integer arithmetic has a compounding effect on elliptic curve operations.

vs Milagro

As measured in status-im/nim-blscurve#47 we are 3x faster than Milagro.
Note that we use the same complete projective formulas, however Constantine is constant-time from the ground up while Milagro has some non-constant-time field operations. Also Constantine uses uint128 where possible and is implemented with carry while Milagro uses lazy reductions.

PR future improvements

The acceleration is hardcoded for BN254 and BLS12-381 at the moment, several improvements are needed in subsequent PR:

  • Tests are hardcoded, there is no fuzzing/property-based testing as the endomorphisms require to be on the proper subgroup of the curve and so we need to clear the cofactor of the random generated points. Clearing the cofactor should be compatible with hash-to-curve IETF draft (https://tools.ietf.org/html/draft-irtf-cfrg-hash-to-curve-08#section-7) in particular for BLS. Scalar multiplication by 1-u is suitable (Wahby, Boneh, 2019, Fast and simple constant-time hashingto the BLS12-381 elliptic curve, https://eprint.iacr.org/2019/403.pdf)
  • The lattice decomposition should go in a curves_family or similar file in config
  • The lattice decomposition could be cleanup with support for negative bigint. Support can be added by having type SignedBigInt[bits] = BigInt[bits+1]
  • The sage scripts are somehow not working in the very last assert: https://github.com/mratsim/constantine/blob/endomorphism-accel/sage/lattice_decomposition_bn254_snarks_g1.sage#L132-L178 (but the implementation does work so I expect some bithacks trouble)
  • The recoding can be significantly cleaned up by adopting the recommendation in the paper in p6
    image
    image in particular, using 2 bits for encoding {-1, 0, 1} is unnecesary
    type
    SignExtender = object
    ## Uses C builtin types sign extension to sign extend 2-bit to 8-bit
    ## in a portable way as sign extension is automatic for builtin types
    ## http://graphics.stanford.edu/~seander/bithacks.html#FixedSignExtend
    digit {.bitsize:2.}: int8
    # TODO: use unsigned to avoid checks and potentially leaking secrets
    # or push checks off (or prove that checks can be elided once Nim has Z3 in the compiler)
    proc `[]`(recoding: Recoded,
    digitIdx: int): int8 {.inline.}=
    ## 0 <= digitIdx < LengthInDigits
    ## returns digit ∈ {0, 1, −1}
    const len = Recoded.LengthInDigits
    assert digitIdx < len
    let slot = distinctBase(recoding)[
    len-1 - (digitIdx shr Shift)
    ]
    let recoded = slot shr (BitSize*(digitIdx and ByteMask)) and DigitMask
    var signExtender: SignExtender
    # Hack with C assignment that return values
    {.emit: [result, " = ", signExtender, ".digit = ", recoded, ";"].}
    # " # Fix highlighting bug in VScode
    , this is because the sign is already encoded in the first miniscalar so we only need {0, 1} dividing memory cost by 2
  • The previous point would allow the library to use uint everywhere and avoids silently adding overflow checks which might leak secret data
  • We have some buffers that don't need to be zero-initialized
  • The lookup table can use simultaneous inversions so that we can use the fast mixed Projective-Affine formulas within the scalar multiplication loop
  • On G1, with a 2 dimensional decomposition, the lookup table is small (2 curve points), we can use a window of 2 or 3 (especially with affine coordinates) with the following estimated speedups:
    • GLV scalarmul on 254-bit scalar --> 127 doubling + 127 additions (from table lookup)
    • With window of size 2 --> 127 doublings + 64 additions (-25% operations)
    • With window of size 3 --> 127 doublings + 43 additions (-33% operations)

@mratsim mratsim added constant time ⏳ Enhancement is suitable for secret data performance 🏁 labels Jun 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
constant time ⏳ Enhancement is suitable for secret data performance 🏁
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant