-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default to the portable version of blst #12720
Default to the portable version of blst #12720
Conversation
env: | ||
CGO_CFLAGS: "-O -D__BLST_PORTABLE__" | ||
env: | ||
CGO_CFLAGS: "-O2 -D__BLST_PORTABLE__" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of BLST is assembly, does O2 really make any difference here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you're right. Not really a difference. I can revert those changes if you'd like.
Full benchmarks for -O
$ CGO_CFLAGS="-O -D__BLST_PORTABLE__" go test -bench=.
goos: darwin
goarch: arm64
pkg: github.com/supranational/blst/bindings/go
BenchmarkCoreSignMinPk-20 3912 314523 ns/op
BenchmarkCoreVerifyMinPk-20 1717 686527 ns/op
BenchmarkCoreVerifyAggregateMinPk/1-20 1711 699531 ns/op
BenchmarkCoreVerifyAggregateMinPk/10-20 1148 1052687 ns/op
BenchmarkCoreVerifyAggregateMinPk/50-20 663 1809055 ns/op
BenchmarkCoreVerifyAggregateMinPk/100-20 412 2952502 ns/op
BenchmarkCoreVerifyAggregateMinPk/300-20 166 7185694 ns/op
BenchmarkCoreVerifyAggregateMinPk/1000-20 57 20762251 ns/op
BenchmarkCoreVerifyAggregateMinPk/4000-20 14 79476500 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/1-20 1731 694436 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/10-20 1177 1040739 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/50-20 674 1844303 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/100-20 422 2891812 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/300-20 171 6932011 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/1000-20 60 20040949 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/4000-20 15 76045650 ns/op
BenchmarkCoreAggregateMinPk/1-20 16617 72119 ns/op
BenchmarkCoreAggregateMinPk/10-20 5600 215089 ns/op
BenchmarkCoreAggregateMinPk/50-20 2538 486315 ns/op
BenchmarkCoreAggregateMinPk/100-20 1785 672548 ns/op
BenchmarkCoreAggregateMinPk/300-20 745 1641775 ns/op
BenchmarkCoreAggregateMinPk/1000-20 231 5103430 ns/op
BenchmarkCoreAggregateMinPk/4000-20 58 19627201 ns/op
BenchmarkBatchUncompressMinPk/Single-20 356 3343535 ns/op 0 B/op 0 allocs/op
BenchmarkBatchUncompressMinPk/Batch-20 2677 442532 ns/op 27976 B/op 25 allocs/op
BenchmarkMultiScalarP1/25000-20 64 18321594 ns/op
BenchmarkMultiScalarP1/50000-20 36 28999912 ns/op
BenchmarkMultiScalarP1/100000-20 16 70427250 ns/op
BenchmarkMultiScalarP1/200000-20 12 95167611 ns/op
BenchmarkToP1Affines/250-20 21806 55289 ns/op
BenchmarkToP1Affines/500-20 10000 106086 ns/op
BenchmarkToP1Affines/1000-20 9228 132230 ns/op
BenchmarkToP1Affines/2000-20 6531 184551 ns/op
BenchmarkToP1Affines/8000-20 3704 322139 ns/op
BenchmarkToP1Affines/32000-20 1564 768823 ns/op
BenchmarkCoreSignMinSig-20 8698 144016 ns/op
BenchmarkCoreVerifyMinSig-20 2031 588822 ns/op
BenchmarkCoreVerifyAggregateMinSig/1-20 1959 613773 ns/op
BenchmarkCoreVerifyAggregateMinSig/10-20 1323 915411 ns/op
BenchmarkCoreVerifyAggregateMinSig/50-20 723 1679000 ns/op
BenchmarkCoreVerifyAggregateMinSig/100-20 488 2442992 ns/op
BenchmarkCoreVerifyAggregateMinSig/300-20 216 5518329 ns/op
BenchmarkCoreVerifyAggregateMinSig/1000-20 78 15283615 ns/op
BenchmarkCoreVerifyAggregateMinSig/4000-20 20 57228042 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/1-20 2040 588643 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/10-20 1342 891380 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/50-20 756 1632554 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/100-20 528 2332293 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/300-20 237 5072165 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/1000-20 88 13764447 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/4000-20 22 51388369 ns/op
BenchmarkCoreAggregateMinSig/1-20 21436 55186 ns/op
BenchmarkCoreAggregateMinSig/10-20 6990 169180 ns/op
BenchmarkCoreAggregateMinSig/50-20 3094 389907 ns/op
BenchmarkCoreAggregateMinSig/100-20 2124 594690 ns/op
BenchmarkCoreAggregateMinSig/300-20 955 1270038 ns/op
BenchmarkCoreAggregateMinSig/1000-20 307 3863035 ns/op
BenchmarkCoreAggregateMinSig/4000-20 81 14152989 ns/op
BenchmarkBatchUncompressMinSig/Single-20 740 1629612 ns/op 0 B/op 0 allocs/op
BenchmarkBatchUncompressMinSig/Batch-20 3933 295187 ns/op 15688 B/op 25 allocs/op
BenchmarkMultiScalarP2/25000-20 24 48945903 ns/op
BenchmarkMultiScalarP2/50000-20 16 68898536 ns/op
BenchmarkMultiScalarP2/100000-20 6 193460312 ns/op
BenchmarkMultiScalarP2/200000-20 5 238246675 ns/op
BenchmarkToP2Affines/250-20 8223 151748 ns/op
BenchmarkToP2Affines/500-20 4089 295977 ns/op
BenchmarkToP2Affines/1000-20 3630 328849 ns/op
BenchmarkToP2Affines/2000-20 3130 386622 ns/op
BenchmarkToP2Affines/8000-20 1896 625344 ns/op
BenchmarkToP2Affines/32000-20 642 1873508 ns/op
PASS
ok github.com/supranational/blst/bindings/go 136.229s
Full benchmarks for -O2
$ CGO_CFLAGS="-O2 -D__BLST_PORTABLE__" go test -bench=.
goos: darwin
goarch: arm64
pkg: github.com/supranational/blst/bindings/go
BenchmarkCoreSignMinPk-20 3928 313500 ns/op
BenchmarkCoreVerifyMinPk-20 1720 690793 ns/op
BenchmarkCoreVerifyAggregateMinPk/1-20 1700 701779 ns/op
BenchmarkCoreVerifyAggregateMinPk/10-20 1143 1058096 ns/op
BenchmarkCoreVerifyAggregateMinPk/50-20 644 1923011 ns/op
BenchmarkCoreVerifyAggregateMinPk/100-20 414 2995941 ns/op
BenchmarkCoreVerifyAggregateMinPk/300-20 164 7198026 ns/op
BenchmarkCoreVerifyAggregateMinPk/1000-20 57 20836943 ns/op
BenchmarkCoreVerifyAggregateMinPk/4000-20 14 79148750 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/1-20 1742 693768 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/10-20 1178 1010504 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/50-20 672 1877869 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/100-20 422 2916079 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/300-20 174 6950790 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/1000-20 60 20006762 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/4000-20 15 76497744 ns/op
BenchmarkCoreAggregateMinPk/1-20 16608 72059 ns/op
BenchmarkCoreAggregateMinPk/10-20 5580 216024 ns/op
BenchmarkCoreAggregateMinPk/50-20 2526 481363 ns/op
BenchmarkCoreAggregateMinPk/100-20 1772 680957 ns/op
BenchmarkCoreAggregateMinPk/300-20 747 1633060 ns/op
BenchmarkCoreAggregateMinPk/1000-20 238 5074667 ns/op
BenchmarkCoreAggregateMinPk/4000-20 62 19081119 ns/op
BenchmarkBatchUncompressMinPk/Single-20 357 3328583 ns/op 0 B/op 0 allocs/op
BenchmarkBatchUncompressMinPk/Batch-20 2667 441938 ns/op 27976 B/op 25 allocs/op
BenchmarkMultiScalarP1/25000-20 64 18320754 ns/op
BenchmarkMultiScalarP1/50000-20 44 27992545 ns/op
BenchmarkMultiScalarP1/100000-20 16 70612201 ns/op
BenchmarkMultiScalarP1/200000-20 13 93837715 ns/op
BenchmarkToP1Affines/250-20 21732 55506 ns/op
BenchmarkToP1Affines/500-20 10000 105958 ns/op
BenchmarkToP1Affines/1000-20 9135 132008 ns/op
BenchmarkToP1Affines/2000-20 6562 184269 ns/op
BenchmarkToP1Affines/8000-20 3708 322652 ns/op
BenchmarkToP1Affines/32000-20 1567 764926 ns/op
BenchmarkCoreSignMinSig-20 8742 143377 ns/op
BenchmarkCoreVerifyMinSig-20 2034 585994 ns/op
BenchmarkCoreVerifyAggregateMinSig/1-20 1964 610177 ns/op
BenchmarkCoreVerifyAggregateMinSig/10-20 1326 923316 ns/op
BenchmarkCoreVerifyAggregateMinSig/50-20 727 1696463 ns/op
BenchmarkCoreVerifyAggregateMinSig/100-20 494 2529292 ns/op
BenchmarkCoreVerifyAggregateMinSig/300-20 216 5579919 ns/op
BenchmarkCoreVerifyAggregateMinSig/1000-20 78 15340082 ns/op
BenchmarkCoreVerifyAggregateMinSig/4000-20 20 57554008 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/1-20 2031 586572 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/10-20 1339 913265 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/50-20 756 1656794 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/100-20 523 2356941 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/300-20 236 5095723 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/1000-20 87 13775938 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/4000-20 22 51111231 ns/op
BenchmarkCoreAggregateMinSig/1-20 21855 54664 ns/op
BenchmarkCoreAggregateMinSig/10-20 7106 168377 ns/op
BenchmarkCoreAggregateMinSig/50-20 3132 393256 ns/op
BenchmarkCoreAggregateMinSig/100-20 2082 594708 ns/op
BenchmarkCoreAggregateMinSig/300-20 962 1252285 ns/op
BenchmarkCoreAggregateMinSig/1000-20 316 3877407 ns/op
BenchmarkCoreAggregateMinSig/4000-20 75 14820597 ns/op
BenchmarkBatchUncompressMinSig/Single-20 744 1634422 ns/op 0 B/op 0 allocs/op
BenchmarkBatchUncompressMinSig/Batch-20 4018 296196 ns/op 15688 B/op 25 allocs/op
BenchmarkMultiScalarP2/25000-20 24 48856965 ns/op
BenchmarkMultiScalarP2/50000-20 15 70723197 ns/op
BenchmarkMultiScalarP2/100000-20 6 191262354 ns/op
BenchmarkMultiScalarP2/200000-20 5 240672517 ns/op
BenchmarkToP2Affines/250-20 8160 151989 ns/op
BenchmarkToP2Affines/500-20 4094 296320 ns/op
BenchmarkToP2Affines/1000-20 3645 328523 ns/op
BenchmarkToP2Affines/2000-20 3116 387107 ns/op
BenchmarkToP2Affines/8000-20 1914 626657 ns/op
BenchmarkToP2Affines/32000-20 638 1874675 ns/op
PASS
ok github.com/supranational/blst/bindings/go 134.332s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah I'm just curious, no clue where the difference may come from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to keep this change because the -O2
affects more than just blst. This is for the whole Prysm build.
Thanksw @jtraglia. This PR generally looks good. We are waiting until after this upcoming release to review/merge this. We may want to preserve the ability to toggle the modern version of blst with the default being portable. That would be a reversion of #12564 while keeping the configuration of portable/modern at build time. |
What type of PR is this?
Other
What does this PR do? Why is it needed?
In blst v0.3.11, the portable build will automatically use optimized code paths if the CPU supports it (aka "runtime detection"). If we default to using the portable build, it should work on all systems & have the same (or better) performance than before. For example, it will be faster on CPUs that support SHA extensions, as those are not enabled by default, but they are checked with runtime detection.
--blst_modern=true
setting in build.-O2
instead of-O
.References