Skip to content
This repository has been archived by the owner on Aug 11, 2020. It is now read-only.

Benchmarks #1

Open
mratsim opened this issue Jul 13, 2020 · 2 comments
Open

Benchmarks #1

mratsim opened this issue Jul 13, 2020 · 2 comments

Comments

@mratsim
Copy link
Contributor

mratsim commented Jul 13, 2020

x86-64

nim c -r --passC:-g -d:danger --hints:off --warnings:off --verbosity:0 --outdir:build benchmarks/bench_all.nim
Warmup: 0.9026 s, result 224 (displayed to avoid compiler optimizing warmup away)


Compiled with GCC
Optimization level => no optimization: false | release: true | danger: true
Using Milagro with 64-bit limbs
Running on Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz



⚠️ Cycles measurements are approximate and use the CPU nominal clock: Turbo-Boost and overclocking will skew them.
i.e. a 20% overclock will be about 20% off (assuming no dynamic frequency scaling)

=====================================================================================================================

Scalar multiplication G1 (255-bit)                             7649.939 ops/s       130720 ns/op       392165 cycles
Scalar multiplication G2 (255-bit)                             2973.783 ops/s       336272 ns/op      1008830 cycles
EC add G1                                                   1295336.788 ops/s          772 ns/op         2317 cycles
EC add G2                                                    452488.688 ops/s         2210 ns/op         6631 cycles
Pairing (Miller loop + Final Exponentiation)                   1315.289 ops/s       760289 ns/op      2280892 cycles
Hash to G2 (Draft #8)                                          3240.304 ops/s       308613 ns/op       925851 cycles

On Broadwell CPUs (Intel 2015) or Ryzen CPUs (AMD 2017) or later support the "ADX" instructions dedicated to big integer arithmetics
You might want to benchmark with --passC:-madx or --passC:"-march=native" to use them.

x86-64 + ADX instructions

nim c -r --passC:"-g -madx" -d:danger --hints:off --warnings:off --verbosity:0 --outdir:build benchmarks/bench_all.nim
Warmup: 0.9030 s, result 224 (displayed to avoid compiler optimizing warmup away)


Compiled with GCC
Optimization level => no optimization: false | release: true | danger: true
Using Milagro with 64-bit limbs
Running on Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz



⚠️ Cycles measurements are approximate and use the CPU nominal clock: Turbo-Boost and overclocking will skew them.
i.e. a 20% overclock will be about 20% off (assuming no dynamic frequency scaling)

=====================================================================================================================

Scalar multiplication G1 (255-bit)                             9631.777 ops/s       103823 ns/op       311473 cycles
Scalar multiplication G2 (255-bit)                             3768.863 ops/s       265332 ns/op       796006 cycles
EC add G1                                                   1706484.642 ops/s          586 ns/op         1758 cycles
EC add G2                                                    598444.045 ops/s         1671 ns/op         5015 cycles
Pairing (Miller loop + Final Exponentiation)                   1639.054 ops/s       610108 ns/op      1830347 cycles
Hash to G2 (Draft #8)                                          4270.876 ops/s       234144 ns/op       702442 cycles

On Broadwell CPUs (Intel 2015) or Ryzen CPUs (AMD 2017) or later support the "ADX" instructions dedicated to big integer arithmetics
You might want to benchmark with --passC:-madx or --passC:"-march=native" to use them.

Comparison

Compare with Milagro and MCL at status-im/nim-blscurve#47

(MCL JIT vs BLST)

  • Scalar mul G1: 200kcycles vs 300kcycles
  • Scalar mul G2: 400kcycles vs 800kcycles
  • Pairing: 2.200Mcycles vs 1.8Mcycles
  • Hash to G2: 467kcycles vs 702kcycles

Analysis:

Side-note on EC Add

MCL add is not constant-time, there are branches to detect infinity and adding the same or the opposite point while BLST always handle (add, double, infinity) cases.

@dot-asm
Copy link

dot-asm commented Jul 19, 2020

Just in case for reference. Among other things performance is also about "perspectives" and priorities. Most notably it's also about multi-processor scalability. This is why some components are not 100% yet. And the keyword is "yet." However, this is not to say that feedback is not appreciated. It certainly is! As well as new pointers and reminders :-) Thanks and cheers!

@mratsim
Copy link
Contributor Author

mratsim commented Jul 20, 2020

Just in case for reference. Among other things performance is also about "perspectives" and priorities. Most notably it's also about multi-processor scalability. This is why some components are not 100% yet. And the keyword is "yet." However, this is not to say that feedback is not appreciated. It certainly is! As well as new pointers and reminders :-) Thanks and cheers!

Thanks, from discussion with some Consensys ZK team during EthCC, they indeed were investigating an issue where they couldn't scale Snarks beyond 16 cores and were looking for solutions to this. It seems to be an important issue for all zero-knowledge actors as LoopRing (which uses a completely different stack) was also scalable only with up to 16 cores. https://medium.com/loopring-protocol/zksnark-prover-optimizations-3e9a3e5578c0

I'm not sure what the current status is at the moment.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants