Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorg curves constants #94

Merged
merged 14 commits into from
Sep 27, 2020
146 changes: 73 additions & 73 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,20 @@

This library provides constant-time implementation of elliptic curve cryptography.

> Warning ⚠️: The library is in development state and cannot be used at the moment
> except as a showcase or to start a discussion on modular big integers internals.
The implementation is accompanied with SAGE code used as reference implementation and test vectors generators before high speed implementation.

> The library is in development state and high-level wrappers or example protocols are not available yet.

## Target audience

The library aims to be a portable, compact and hardened library for elliptic curve cryptography needs, in particular for blockchain protocols and zero-knowledge proofs system.

The library focuses on following properties:
- constant-time (not leaking secret data via side-channels)
- performance
- generated code size, datatype size and stack usage

in this order

## Installation

Expand All @@ -31,50 +43,39 @@ This can be deactivated with `"-d:ConstantineASM=false"`:
- at misssed opportunity on recent CPUs that support MULX/ADCX/ADOX instructions (~60% faster than Clang).
- There is a 2.4x perf ratio between using plain GCC vs GCC with inline assembly.

## Target audience

The library aims to be a portable, compact and hardened library for elliptic curve cryptography needs, in particular for blockchain protocols and zero-knowledge proofs system.

The library focuses on following properties:
- constant-time (not leaking secret data via side-channels)
- performance
- generated code size, datatype size and stack usage

in this order

## Curves supported

At the moment the following curves are supported, adding a new curve only requires adding the prime modulus
and its bitsize in [constantine/config/curves.nim](constantine/config/curves_declaration.nim).

The following curves are configured:

> Note: At the moment, finite field arithmetic is fully supported
> but elliptic curve arithmetic is work-in-progress.

### ECDH / ECDSA curves
### ECDH / ECDSA / EdDSA curves

WIP:
- NIST P-224
- Curve25519
- NIST P-256 / Secp256r1
- Secp256k1 (Bitcoin, Ethereum 1)

### Pairing-Friendly curves

Supports:
- [x] Field arithmetics
- [x] Curve arithmetic
- [x] Pairing
- [ ] Multi-Pairing
- [ ] Hash-To-Curve

Families:
- BN: Barreto-Naerig
- BN: Barreto-Naehrig
- BLS: Barreto-Lynn-Scott
- FKM: Fotiadis-Konstantinou-Martindale

Curves:
- BN254_Nogami
- BN254_Snarks (Zero-Knowledge Proofs, Snarks, Starks, Zcash, Ethereum 1)
- BLS12-377 (Zexe)
- BLS12-381 (Algorand, Chia Networks, Dfinity, Ethereum 2, Filecoin, Zcash Sapling)
- BN446
- FKM12-447
- BLS12-461
- BN462

## Security

Expand Down Expand Up @@ -141,73 +142,72 @@ The previous implementation was 15x slower and one of the key optimizations
was changing the elliptic curve cryptography backend.
It had a direct implication on hardware cost and/or cloud computing resources required.

## Measuring performance
### Measuring performance

To measure the performance of Constantine

```bash
git clone https://github.com/mratsim/constantine
nimble bench_fp # Using Assembly (+ GCC)
nimble bench_fp_clang # Using Clang only
nimble bench_fp_gcc # Using Clang only (very slow)
nimble bench_fp # Using default compiler + Assembly
nimble bench_fp_clang # Using Clang + Assembly (recommended)
nimble bench_fp_gcc # Using GCC + Assembly (very slow)
nimble bench_fp_clang_noasm # Using Clang only
nimble bench_fp_gcc # Using GCC only (slowest)
nimble bench_fp2
# ...
nimble bench_ec_g1
nimble bench_ec_g2
nimble bench_pairing_bn254_nogami
nimble bench_pairing_bn254_snarks
nimble bench_pairing_bls12_377
nimble bench_pairing_bls12_381
```

"Unsafe" lines uses a non-constant-time algorithm.

As mentioned in the [Compiler caveats](#compiler-caveats) section, GCC is up to 2x slower than Clang due to mishandling of carries and register usage.

On my machine, for selected benchmarks on the prime field for popular pairing-friendly curves.
On my machine i9-9980XE, for selected benchmarks with Clang + Assembly

```
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Line double BLS12_381 649350.649 ops/s 1540 ns/op 4617 CPU cycles (approx)
Line add BLS12_381 482858.522 ops/s 2071 ns/op 6211 CPU cycles (approx)
Mul 𝔽p12 by line xy000z BLS12_381 543478.261 ops/s 1840 ns/op 5518 CPU cycles (approx)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Final Exponentiation Easy BLS12_381 39411.973 ops/s 25373 ns/op 76119 CPU cycles (approx)
Final Exponentiation Hard BLS12 BLS12_381 2141.603 ops/s 466940 ns/op 1400833 CPU cycles (approx)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Miller Loop BLS12 BLS12_381 2731.576 ops/s 366089 ns/op 1098278 CPU cycles (approx)
Final Exponentiation BLS12 BLS12_381 2033.045 ops/s 491873 ns/op 1475634 CPU cycles (approx)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Pairing BLS12 BLS12_381 1131.391 ops/s 883868 ns/op 2651631 CPU cycles (approx)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
```

```
Compiled with GCC
Optimization level =>
no optimization: false
release: true
danger: true
inline assembly: true
Using Constantine with 64-bit limbs
Running on Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz

⚠️ Cycles measurements are approximate and use the CPU nominal clock: Turbo-Boost and overclocking will skew them.
i.e. a 20% overclock will be about 20% off (assuming no dynamic frequency scaling)

=================================================================================================================

-------------------------------------------------------------------------------------------------------------------------------------------------
Addition Fp[BN254_Snarks] 333333333.333 ops/s 3 ns/op 9 CPU cycles (approx)
Substraction Fp[BN254_Snarks] 500000000.000 ops/s 2 ns/op 8 CPU cycles (approx)
Negation Fp[BN254_Snarks] 1000000000.000 ops/s 1 ns/op 3 CPU cycles (approx)
Multiplication Fp[BN254_Snarks] 71428571.429 ops/s 14 ns/op 44 CPU cycles (approx)
Squaring Fp[BN254_Snarks] 71428571.429 ops/s 14 ns/op 44 CPU cycles (approx)
Inversion (constant-time Euclid) Fp[BN254_Snarks] 122579.063 ops/s 8158 ns/op 24474 CPU cycles (approx)
Inversion via exponentiation p-2 (Little Fermat) Fp[BN254_Snarks] 153822.489 ops/s 6501 ns/op 19504 CPU cycles (approx)
Square Root + square check (constant-time) Fp[BN254_Snarks] 153491.942 ops/s 6515 ns/op 19545 CPU cycles (approx)
Exp curve order (constant-time) - 254-bit Fp[BN254_Snarks] 104580.632 ops/s 9562 ns/op 28687 CPU cycles (approx)
Exp curve order (Leak exponent bits) - 254-bit Fp[BN254_Snarks] 153798.831 ops/s 6502 ns/op 19506 CPU cycles (approx)
-------------------------------------------------------------------------------------------------------------------------------------------------
Addition Fp[BLS12_381] 250000000.000 ops/s 4 ns/op 14 CPU cycles (approx)
Substraction Fp[BLS12_381] 250000000.000 ops/s 4 ns/op 13 CPU cycles (approx)
Negation Fp[BLS12_381] 1000000000.000 ops/s 1 ns/op 4 CPU cycles (approx)
Multiplication Fp[BLS12_381] 35714285.714 ops/s 28 ns/op 84 CPU cycles (approx)
Squaring Fp[BLS12_381] 35714285.714 ops/s 28 ns/op 85 CPU cycles (approx)
Inversion (constant-time Euclid) Fp[BLS12_381] 43763.676 ops/s 22850 ns/op 68552 CPU cycles (approx)
Inversion via exponentiation p-2 (Little Fermat) Fp[BLS12_381] 63983.620 ops/s 15629 ns/op 46889 CPU cycles (approx)
Square Root + square check (constant-time) Fp[BLS12_381] 63856.960 ops/s 15660 ns/op 46982 CPU cycles (approx)
Exp curve order (constant-time) - 255-bit Fp[BLS12_381] 68535.399 ops/s 14591 ns/op 43775 CPU cycles (approx)
Exp curve order (Leak exponent bits) - 255-bit Fp[BLS12_381] 93222.709 ops/s 10727 ns/op 32181 CPU cycles (approx)
-------------------------------------------------------------------------------------------------------------------------------------------------
Notes:
- Compilers:
Compilers are severely limited on multiprecision arithmetic.
Inline Assembly is used by default (nimble bench_fp).
Bench without assembly can use "nimble bench_fp_gcc" or "nimble bench_fp_clang".
GCC is significantly slower than Clang on multiprecision arithmetic due to catastrophic handling of carries.
- The simplest operations might be optimized away by the compiler.
- Fast Squaring and Fast Multiplication are possible if there are spare bits in the prime representation (i.e. the prime uses 254 bits out of 256 bits)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
EC Add G1 ECP_SWei_Proj[Fp[BLS12_381]] 2118644.068 ops/s 472 ns/op 1416 CPU cycles (approx)
EC Mixed Addition G1 ECP_SWei_Proj[Fp[BLS12_381]] 2439024.390 ops/s 410 ns/op 1232 CPU cycles (approx)
EC Double G1 ECP_SWei_Proj[Fp[BLS12_381]] 3448275.862 ops/s 290 ns/op 871 CPU cycles (approx)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
EC ScalarMul G1 (unsafe reference DoubleAdd) ECP_SWei_Proj[Fp[BLS12_381]] 7147.094 ops/s 139917 ns/op 419756 CPU cycles (approx)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
EC ScalarMul Generic G1 (window = 2, scratchsize = 4) ECP_SWei_Proj[Fp[BLS12_381]] 5048.975 ops/s 198060 ns/op 594188 CPU cycles (approx)
EC ScalarMul Generic G1 (window = 3, scratchsize = 8) ECP_SWei_Proj[Fp[BLS12_381]] 7148.269 ops/s 139894 ns/op 419685 CPU cycles (approx)
EC ScalarMul Generic G1 (window = 4, scratchsize = 16) ECP_SWei_Proj[Fp[BLS12_381]] 8112.735 ops/s 123263 ns/op 369791 CPU cycles (approx)
EC ScalarMul Generic G1 (window = 5, scratchsize = 32) ECP_SWei_Proj[Fp[BLS12_381]] 8464.534 ops/s 118140 ns/op 354424 CPU cycles (approx)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
EC ScalarMul G1 (endomorphism accelerated) ECP_SWei_Proj[Fp[BLS12_381]] 9679.418 ops/s 103312 ns/op 309939 CPU cycles (approx)
EC ScalarMul Window-2 G1 (endomorphism accelerated) ECP_SWei_Proj[Fp[BLS12_381]] 13089.348 ops/s 76398 ns/op 229195 CPU cycles (approx)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
```




### Compiler caveats

Unfortunately compilers and in particular GCC are not very good at optimizing big integers and/or cryptographic code even when using intrinsics like `addcarry_u64`.
Expand Down
4 changes: 0 additions & 4 deletions benchmarks/bench_ec_g1.nim
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,6 @@ const AvailableCurves = [
# Secp256k1,
BLS12_377,
BLS12_381,
# BN446,
# FKM12_447,
# BLS12_461,
# BN462
]

proc main() =
Expand Down
4 changes: 0 additions & 4 deletions benchmarks/bench_ec_g2.nim
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,6 @@ const AvailableCurves = [
# Secp256k1,
BLS12_377,
BLS12_381,
# BN446,
# FKM12_447,
# BLS12_461,
# BN462
]

proc main() =
Expand Down
4 changes: 0 additions & 4 deletions benchmarks/bench_fp.nim
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,6 @@ const AvailableCurves = [
# Secp256k1,
BLS12_377,
BLS12_381,
# BN446,
# FKM12_447,
# BLS12_461,
# BN462
]

proc main() =
Expand Down
4 changes: 0 additions & 4 deletions benchmarks/bench_fp12.nim
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,6 @@ const AvailableCurves = [
BN254_Snarks,
BLS12_377,
BLS12_381
# BN446,
# FKM12_447,
# BLS12_461,
# BN462
]

proc main() =
Expand Down
4 changes: 0 additions & 4 deletions benchmarks/bench_fp2.nim
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,6 @@ const AvailableCurves = [
BN254_Snarks,
BLS12_377,
BLS12_381
# BN446,
# FKM12_447,
# BLS12_461,
# BN462
]

proc main() =
Expand Down
6 changes: 1 addition & 5 deletions benchmarks/bench_fp6.nim
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,7 @@ const AvailableCurves = [
BN254_Nogami,
BN254_Snarks,
BLS12_377,
BLS12_381
# BN446,
# FKM12_447,
# BLS12_461,
# BN462
BLS12_381,
]

proc main() =
Expand Down
3 changes: 1 addition & 2 deletions constantine.nimble
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,7 @@ const testDesc: seq[tuple[path: string, useGMP: bool]] = @[
("tests/t_finite_fields_sqrt.nim", false),
("tests/t_finite_fields_powinv.nim", false),
("tests/t_finite_fields_vs_gmp.nim", true),
# Precompute
("tests/t_precomputed", false),
("tests/t_fp_cubic_root.nim", false),
# Double-width finite fields
("tests/t_finite_fields_double_width.nim", false),
# Towers of extension fields
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ import
# Standard library
std/macros,
# Internal
../config/common,
../primitives
../../config/common,
../../primitives

# ############################################################
#
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ import
# Standard library
std/macros,
# Internal
../config/common,
../primitives,
./limbs
../../config/common,
../../primitives,
../limbs

# ############################################################
#
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ import
# Standard library
std/macros,
# Internal
../config/common,
../primitives,
./limbs,
../../config/common,
../../primitives,
../limbs,
./limbs_asm_montred_x86

# ############################################################
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ import
# Standard library
std/macros,
# Internal
../config/common,
../primitives,
./limbs,
../../config/common,
../../primitives,
../limbs,
./limbs_asm_montred_x86

# ############################################################
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ import
# Standard library
std/macros,
# Internal
../config/common,
../primitives,
./limbs
../../config/common,
../../primitives,
../limbs

# ############################################################
#
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ import
# Standard library
std/macros,
# Internal
../config/common,
../primitives,
./limbs,
../../config/common,
../../primitives,
../limbs,
./limbs_asm_montred_x86

# ############################################################
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ import
# Standard library
std/macros,
# Internal
../config/common,
../primitives
../../config/common,
../../primitives

# ############################################################
#
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ import
# Standard library
std/macros,
# Internal
../config/common,
../primitives
../../config/common,
../../primitives

# ############################################################
#
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ import
# Standard library
std/macros,
# Internal
../config/common,
../primitives
../../config/common,
../../primitives

# ############################################################
#
Expand Down
Loading