Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to the portable version of blst #12720

Merged
merged 4 commits into from
Sep 29, 2023
Merged

Default to the portable version of blst #12720

merged 4 commits into from
Sep 29, 2023

Conversation

jtraglia
Copy link
Contributor

@jtraglia jtraglia commented Aug 10, 2023

What type of PR is this?

Other

What does this PR do? Why is it needed?

In blst v0.3.11, the portable build will automatically use optimized code paths if the CPU supports it (aka "runtime detection"). If we default to using the portable build, it should work on all systems & have the same (or better) performance than before. For example, it will be faster on CPUs that support SHA extensions, as those are not enabled by default, but they are checked with runtime detection.

  • Remove --blst_modern=true setting in build.
  • Compile with -O2 instead of -O.
    • This is the default and the blst docs should be updated.

References

@jtraglia jtraglia requested a review from a team as a code owner August 10, 2023 21:28
env:
CGO_CFLAGS: "-O -D__BLST_PORTABLE__"
env:
CGO_CFLAGS: "-O2 -D__BLST_PORTABLE__"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of BLST is assembly, does O2 really make any difference here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you're right. Not really a difference. I can revert those changes if you'd like.

Full benchmarks for -O
$ CGO_CFLAGS="-O -D__BLST_PORTABLE__" go test -bench=.
goos: darwin
goarch: arm64
pkg: github.com/supranational/blst/bindings/go
BenchmarkCoreSignMinPk-20                        	    3912	    314523 ns/op
BenchmarkCoreVerifyMinPk-20                      	    1717	    686527 ns/op
BenchmarkCoreVerifyAggregateMinPk/1-20           	    1711	    699531 ns/op
BenchmarkCoreVerifyAggregateMinPk/10-20          	    1148	   1052687 ns/op
BenchmarkCoreVerifyAggregateMinPk/50-20          	     663	   1809055 ns/op
BenchmarkCoreVerifyAggregateMinPk/100-20         	     412	   2952502 ns/op
BenchmarkCoreVerifyAggregateMinPk/300-20         	     166	   7185694 ns/op
BenchmarkCoreVerifyAggregateMinPk/1000-20        	      57	  20762251 ns/op
BenchmarkCoreVerifyAggregateMinPk/4000-20        	      14	  79476500 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/1-20   	    1731	    694436 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/10-20  	    1177	   1040739 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/50-20  	     674	   1844303 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/100-20 	     422	   2891812 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/300-20 	     171	   6932011 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/1000-20         	      60	  20040949 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/4000-20         	      15	  76045650 ns/op
BenchmarkCoreAggregateMinPk/1-20                          	   16617	     72119 ns/op
BenchmarkCoreAggregateMinPk/10-20                         	    5600	    215089 ns/op
BenchmarkCoreAggregateMinPk/50-20                         	    2538	    486315 ns/op
BenchmarkCoreAggregateMinPk/100-20                        	    1785	    672548 ns/op
BenchmarkCoreAggregateMinPk/300-20                        	     745	   1641775 ns/op
BenchmarkCoreAggregateMinPk/1000-20                       	     231	   5103430 ns/op
BenchmarkCoreAggregateMinPk/4000-20                       	      58	  19627201 ns/op
BenchmarkBatchUncompressMinPk/Single-20                   	     356	   3343535 ns/op	       0 B/op	       0 allocs/op
BenchmarkBatchUncompressMinPk/Batch-20                    	    2677	    442532 ns/op	   27976 B/op	      25 allocs/op
BenchmarkMultiScalarP1/25000-20                           	      64	  18321594 ns/op
BenchmarkMultiScalarP1/50000-20                           	      36	  28999912 ns/op
BenchmarkMultiScalarP1/100000-20                          	      16	  70427250 ns/op
BenchmarkMultiScalarP1/200000-20                          	      12	  95167611 ns/op
BenchmarkToP1Affines/250-20                               	   21806	     55289 ns/op
BenchmarkToP1Affines/500-20                               	   10000	    106086 ns/op
BenchmarkToP1Affines/1000-20                              	    9228	    132230 ns/op
BenchmarkToP1Affines/2000-20                              	    6531	    184551 ns/op
BenchmarkToP1Affines/8000-20                              	    3704	    322139 ns/op
BenchmarkToP1Affines/32000-20                             	    1564	    768823 ns/op
BenchmarkCoreSignMinSig-20                                	    8698	    144016 ns/op
BenchmarkCoreVerifyMinSig-20                              	    2031	    588822 ns/op
BenchmarkCoreVerifyAggregateMinSig/1-20                   	    1959	    613773 ns/op
BenchmarkCoreVerifyAggregateMinSig/10-20                  	    1323	    915411 ns/op
BenchmarkCoreVerifyAggregateMinSig/50-20                  	     723	   1679000 ns/op
BenchmarkCoreVerifyAggregateMinSig/100-20                 	     488	   2442992 ns/op
BenchmarkCoreVerifyAggregateMinSig/300-20                 	     216	   5518329 ns/op
BenchmarkCoreVerifyAggregateMinSig/1000-20                	      78	  15283615 ns/op
BenchmarkCoreVerifyAggregateMinSig/4000-20                	      20	  57228042 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/1-20           	    2040	    588643 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/10-20          	    1342	    891380 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/50-20          	     756	   1632554 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/100-20         	     528	   2332293 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/300-20         	     237	   5072165 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/1000-20        	      88	  13764447 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/4000-20        	      22	  51388369 ns/op
BenchmarkCoreAggregateMinSig/1-20                         	   21436	     55186 ns/op
BenchmarkCoreAggregateMinSig/10-20                        	    6990	    169180 ns/op
BenchmarkCoreAggregateMinSig/50-20                        	    3094	    389907 ns/op
BenchmarkCoreAggregateMinSig/100-20                       	    2124	    594690 ns/op
BenchmarkCoreAggregateMinSig/300-20                       	     955	   1270038 ns/op
BenchmarkCoreAggregateMinSig/1000-20                      	     307	   3863035 ns/op
BenchmarkCoreAggregateMinSig/4000-20                      	      81	  14152989 ns/op
BenchmarkBatchUncompressMinSig/Single-20                  	     740	   1629612 ns/op	       0 B/op	       0 allocs/op
BenchmarkBatchUncompressMinSig/Batch-20                   	    3933	    295187 ns/op	   15688 B/op	      25 allocs/op
BenchmarkMultiScalarP2/25000-20                           	      24	  48945903 ns/op
BenchmarkMultiScalarP2/50000-20                           	      16	  68898536 ns/op
BenchmarkMultiScalarP2/100000-20                          	       6	 193460312 ns/op
BenchmarkMultiScalarP2/200000-20                          	       5	 238246675 ns/op
BenchmarkToP2Affines/250-20                               	    8223	    151748 ns/op
BenchmarkToP2Affines/500-20                               	    4089	    295977 ns/op
BenchmarkToP2Affines/1000-20                              	    3630	    328849 ns/op
BenchmarkToP2Affines/2000-20                              	    3130	    386622 ns/op
BenchmarkToP2Affines/8000-20                              	    1896	    625344 ns/op
BenchmarkToP2Affines/32000-20                             	     642	   1873508 ns/op
PASS
ok  	github.com/supranational/blst/bindings/go	136.229s
Full benchmarks for -O2
$ CGO_CFLAGS="-O2 -D__BLST_PORTABLE__" go test -bench=.
goos: darwin
goarch: arm64
pkg: github.com/supranational/blst/bindings/go
BenchmarkCoreSignMinPk-20                        	    3928	    313500 ns/op
BenchmarkCoreVerifyMinPk-20                      	    1720	    690793 ns/op
BenchmarkCoreVerifyAggregateMinPk/1-20           	    1700	    701779 ns/op
BenchmarkCoreVerifyAggregateMinPk/10-20          	    1143	   1058096 ns/op
BenchmarkCoreVerifyAggregateMinPk/50-20          	     644	   1923011 ns/op
BenchmarkCoreVerifyAggregateMinPk/100-20         	     414	   2995941 ns/op
BenchmarkCoreVerifyAggregateMinPk/300-20         	     164	   7198026 ns/op
BenchmarkCoreVerifyAggregateMinPk/1000-20        	      57	  20836943 ns/op
BenchmarkCoreVerifyAggregateMinPk/4000-20        	      14	  79148750 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/1-20   	    1742	    693768 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/10-20  	    1178	   1010504 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/50-20  	     672	   1877869 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/100-20 	     422	   2916079 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/300-20 	     174	   6950790 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/1000-20         	      60	  20006762 ns/op
BenchmarkVerifyAggregateUncompressedMinPk/4000-20         	      15	  76497744 ns/op
BenchmarkCoreAggregateMinPk/1-20                          	   16608	     72059 ns/op
BenchmarkCoreAggregateMinPk/10-20                         	    5580	    216024 ns/op
BenchmarkCoreAggregateMinPk/50-20                         	    2526	    481363 ns/op
BenchmarkCoreAggregateMinPk/100-20                        	    1772	    680957 ns/op
BenchmarkCoreAggregateMinPk/300-20                        	     747	   1633060 ns/op
BenchmarkCoreAggregateMinPk/1000-20                       	     238	   5074667 ns/op
BenchmarkCoreAggregateMinPk/4000-20                       	      62	  19081119 ns/op
BenchmarkBatchUncompressMinPk/Single-20                   	     357	   3328583 ns/op	       0 B/op	       0 allocs/op
BenchmarkBatchUncompressMinPk/Batch-20                    	    2667	    441938 ns/op	   27976 B/op	      25 allocs/op
BenchmarkMultiScalarP1/25000-20                           	      64	  18320754 ns/op
BenchmarkMultiScalarP1/50000-20                           	      44	  27992545 ns/op
BenchmarkMultiScalarP1/100000-20                          	      16	  70612201 ns/op
BenchmarkMultiScalarP1/200000-20                          	      13	  93837715 ns/op
BenchmarkToP1Affines/250-20                               	   21732	     55506 ns/op
BenchmarkToP1Affines/500-20                               	   10000	    105958 ns/op
BenchmarkToP1Affines/1000-20                              	    9135	    132008 ns/op
BenchmarkToP1Affines/2000-20                              	    6562	    184269 ns/op
BenchmarkToP1Affines/8000-20                              	    3708	    322652 ns/op
BenchmarkToP1Affines/32000-20                             	    1567	    764926 ns/op
BenchmarkCoreSignMinSig-20                                	    8742	    143377 ns/op
BenchmarkCoreVerifyMinSig-20                              	    2034	    585994 ns/op
BenchmarkCoreVerifyAggregateMinSig/1-20                   	    1964	    610177 ns/op
BenchmarkCoreVerifyAggregateMinSig/10-20                  	    1326	    923316 ns/op
BenchmarkCoreVerifyAggregateMinSig/50-20                  	     727	   1696463 ns/op
BenchmarkCoreVerifyAggregateMinSig/100-20                 	     494	   2529292 ns/op
BenchmarkCoreVerifyAggregateMinSig/300-20                 	     216	   5579919 ns/op
BenchmarkCoreVerifyAggregateMinSig/1000-20                	      78	  15340082 ns/op
BenchmarkCoreVerifyAggregateMinSig/4000-20                	      20	  57554008 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/1-20           	    2031	    586572 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/10-20          	    1339	    913265 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/50-20          	     756	   1656794 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/100-20         	     523	   2356941 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/300-20         	     236	   5095723 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/1000-20        	      87	  13775938 ns/op
BenchmarkVerifyAggregateUncompressedMinSig/4000-20        	      22	  51111231 ns/op
BenchmarkCoreAggregateMinSig/1-20                         	   21855	     54664 ns/op
BenchmarkCoreAggregateMinSig/10-20                        	    7106	    168377 ns/op
BenchmarkCoreAggregateMinSig/50-20                        	    3132	    393256 ns/op
BenchmarkCoreAggregateMinSig/100-20                       	    2082	    594708 ns/op
BenchmarkCoreAggregateMinSig/300-20                       	     962	   1252285 ns/op
BenchmarkCoreAggregateMinSig/1000-20                      	     316	   3877407 ns/op
BenchmarkCoreAggregateMinSig/4000-20                      	      75	  14820597 ns/op
BenchmarkBatchUncompressMinSig/Single-20                  	     744	   1634422 ns/op	       0 B/op	       0 allocs/op
BenchmarkBatchUncompressMinSig/Batch-20                   	    4018	    296196 ns/op	   15688 B/op	      25 allocs/op
BenchmarkMultiScalarP2/25000-20                           	      24	  48856965 ns/op
BenchmarkMultiScalarP2/50000-20                           	      15	  70723197 ns/op
BenchmarkMultiScalarP2/100000-20                          	       6	 191262354 ns/op
BenchmarkMultiScalarP2/200000-20                          	       5	 240672517 ns/op
BenchmarkToP2Affines/250-20                               	    8160	    151989 ns/op
BenchmarkToP2Affines/500-20                               	    4094	    296320 ns/op
BenchmarkToP2Affines/1000-20                              	    3645	    328523 ns/op
BenchmarkToP2Affines/2000-20                              	    3116	    387107 ns/op
BenchmarkToP2Affines/8000-20                              	    1914	    626657 ns/op
BenchmarkToP2Affines/32000-20                             	     638	   1874675 ns/op
PASS
ok  	github.com/supranational/blst/bindings/go	134.332s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah I'm just curious, no clue where the difference may come from

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to keep this change because the -O2 affects more than just blst. This is for the whole Prysm build.

third_party/blst/blst.BUILD Outdated Show resolved Hide resolved
@prestonvanloon
Copy link
Member

prestonvanloon commented Aug 18, 2023

Thanksw @jtraglia. This PR generally looks good. We are waiting until after this upcoming release to review/merge this.

We may want to preserve the ability to toggle the modern version of blst with the default being portable. That would be a reversion of #12564 while keeping the configuration of portable/modern at build time.

@prestonvanloon prestonvanloon merged commit b667c68 into prysmaticlabs:develop Sep 29, 2023
17 checks passed
@jtraglia jtraglia deleted the default-portable-blst branch September 29, 2023 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Caught SIGILL in blst_cgo_init, consult <blst>/bindinds/go/README.md.
3 participants