Cache alignment for serial and parallel FFT and IFFT #245

jon-chuang · 2021-03-24T12:19:34Z

Description

closes: #242
Results:
FFT Parallel 16C32T
2^20 - 11%
2^21 - 17%
2^22 - 25%

IFFT Parallel 16C32T
2^20 - 11%
2^21 - 23%
2^22 - 17%

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

Targeted PR against correct branch (master)
Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
Wrote unit tests
Updated relevant documentation in the code
Added a relevant changelog entry to the Pending section in CHANGELOG.md
Re-reviewed Files changed in the Github PR explorer

CHANGELOG.md

poly/src/domain/radix2/fft.rs

Pratyush · 2021-03-24T12:54:30Z

Thanks for the PR @jon-chuang! could you also benchmark the difference at lower thread count and at smaller sizes? IIRC when @ValarDragon tried the realignment strategy for the parallel case, there was a big slowdown for smaller sizes.

ValarDragon · 2021-03-24T16:22:29Z

I tried this idea for the parallel case before, and was getting 20%+ slowdowns at small FFT's. (2^15, 2^16) I'll check again for this PR in particular to see how it performs.

In the parallel case, its not super clear to me that taking 50% extra memory is a great trade-off to be making across the board

jon-chuang · 2021-03-25T06:03:43Z

@ValarDragon Yes, I was getting small slowdowns (~10%) too, both for parallel and non-parallel case.
Its hard to know in advance which of the two would work better for any given curve and number of threads.

With regards to 50% extra memory, I think this is smaller than that, its more like 25% of the length of the FFT array.

But currently, the parallelisation speedup I'm getting is pretty terrible: 4x on an 8C16T machine... There seems to be a lot of room for improvement... Thread util is about 50%. Pretty crap.

jon-chuang · 2021-03-25T06:55:26Z

Here are the terrible sad results:

WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/32768
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/32768: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/32768: Collecting 100 samples in estimated 5.1728 s (1700 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/32768: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/32768
                        time:   [2.9688 ms 3.0300 ms 3.0938 ms]
                        change: [+29.440% +33.263% +37.093%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/65536
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/65536: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/65536: Collecting 100 samples in estimated 5.5117 s (1000 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/65536: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/65536
                        time:   [5.4641 ms 5.6596 ms 5.8604 ms]
                        change: [+23.748% +28.742% +34.456%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/131072
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/131072: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/131072: Collecting 100 samples in estimated 6.1528 s (500 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/131072: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/131072
                        time:   [10.995 ms 11.526 ms 12.092 ms]
                        change: [+22.798% +29.560% +35.644%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/262144
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/262144: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/262144: Collecting 100 samples in estimated 6.9858 s (300 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/262144: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/262144
                        time:   [24.550 ms 25.577 ms 26.668 ms]
                        change: [+16.490% +21.934% +27.285%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/524288
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/524288: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.4s, or reduce sample count to 90.
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/524288: Collecting 100 samples in estimated 5.4361 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/524288: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/524288
                        time:   [50.259 ms 52.124 ms 54.128 ms]
                        change: [+5.6750% +9.6124% +14.538%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  11 (11.00%) high mild
  2 (2.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/1048576
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/1048576: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.2s, or reduce sample count to 50.
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/1048576: Collecting 100 samples in estimated 9.1722 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/1048576: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/1048576
                        time:   [92.486 ms 93.790 ms 95.188 ms]
                        change: [-10.309% -7.5650% -4.9447%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/2097152
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/2097152: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 20.1s, or reduce sample count to 20.
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/2097152: Collecting 100 samples in estimated 20.147 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/2097152: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/2097152
                        time:   [194.98 ms 197.81 ms 200.96 ms]
                        change: [-8.4606% -6.8893% -5.1352%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/4194304
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/4194304: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 38.1s, or reduce sample count to 10.
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/4194304: Collecting 100 samples in estimated 38.052 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/4194304: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/4194304
                        time:   [382.67 ms 384.74 ms 386.91 ms]
                        change: [-17.897% -16.662% -15.519%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/32768
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/32768: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/32768: Collecting 100 samples in estimated 5.0437 s (2100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/32768: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/32768
                        time:   [2.6042 ms 2.6729 ms 2.7477 ms]
                        change: [+6.9566% +10.969% +15.304%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/65536
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/65536: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/65536: Collecting 100 samples in estimated 5.0822 s (800 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/65536: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/65536
                        time:   [5.2793 ms 5.4697 ms 5.6789 ms]
                        change: [+16.143% +22.205% +28.317%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/131072
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/131072: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/131072: Collecting 100 samples in estimated 5.1647 s (500 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/131072: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/131072
                        time:   [9.9436 ms 10.117 ms 10.294 ms]
                        change: [+6.8750% +9.9687% +13.350%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/262144
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/262144: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/262144: Collecting 100 samples in estimated 7.0024 s (300 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/262144: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/262144
                        time:   [22.617 ms 22.989 ms 23.383 ms]
                        change: [+4.4927% +7.6459% +10.659%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/524288
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/524288: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/524288: Collecting 100 samples in estimated 9.6489 s (200 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/524288: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/524288
                        time:   [47.961 ms 48.583 ms 49.270 ms]
                        change: [-1.1671% +1.3471% +3.8213%] (p = 0.29 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/1048576
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/1048576: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.4s, or reduce sample count to 50.
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/1048576: Collecting 100 samples in estimated 9.4081 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/1048576: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/1048576
                        time:   [92.923 ms 93.911 ms 94.995 ms]
                        change: [-6.0794% -4.1757% -2.3586%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/2097152
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/2097152: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 19.3s, or reduce sample count to 20.
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/2097152: Collecting 100 samples in estimated 19.337 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/2097152: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/2097152
                        time:   [193.46 ms 194.86 ms 196.31 ms]
                        change: [-10.583% -9.2857% -8.0497%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/4194304
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/4194304: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 39.5s, or reduce sample count to 10.
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/4194304: Collecting 100 samples in estimated 39.478 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/4194304: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/4194304
                        time:   [409.03 ms 414.09 ms 419.59 ms]
                        change: [-9.6244% -8.4244% -7.0350%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

Benchmarking "bls12_381 - radix2" - coset_fft_in_place/32768
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/32768: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/32768: Collecting 100 samples in estimated 5.2216 s (1700 iterations)
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/32768: Analyzing
"bls12_381 - radix2" - coset_fft_in_place/32768
                        time:   [3.1329 ms 3.2375 ms 3.3454 ms]
                        change: [+18.493% +22.983% +28.159%] (p = 0.00 < 0.05)
                        Performance has regressed.
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/65536
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/65536: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/65536: Collecting 100 samples in estimated 5.5730 s (900 iterations)
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/65536: Analyzing
"bls12_381 - radix2" - coset_fft_in_place/65536
                        time:   [5.2982 ms 5.4014 ms 5.5130 ms]
                        change: [+2.2447% +4.6582% +7.2173%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/131072
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/131072: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/131072: Collecting 100 samples in estimated 6.0598 s (500 iterations)
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/131072: Analyzing
"bls12_381 - radix2" - coset_fft_in_place/131072
                        time:   [11.114 ms 11.369 ms 11.640 ms]
                        change: [+3.9870% +6.8438% +10.094%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/262144
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/262144: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/262144: Collecting 100 samples in estimated 5.0236 s (200 iterations)
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/262144: Analyzing
"bls12_381 - radix2" - coset_fft_in_place/262144
                        time:   [24.164 ms 24.654 ms 25.213 ms]
                        change: [-4.6415% -0.6099% +3.3769%] (p = 0.77 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/524288
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/524288: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/524288: Collecting 100 samples in estimated 9.9466 s (200 iterations)

jon-chuang · 2021-03-25T07:38:11Z

This much more favourable result is achieved when the subchunks are only utilised for very large gaps

WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/32768
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/32768: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/32768: Collecting 100 samples in estimated 5.0861 s (2000 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/32768: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/32768
                        time:   [2.5101 ms 2.5442 ms 2.5819 ms]
                        change: [+4.4667% +7.5754% +10.604%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/65536
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/65536: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/65536: Collecting 100 samples in estimated 5.2808 s (1200 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/65536: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/65536
                        time:   [4.3433 ms 4.4042 ms 4.4675 ms]
                        change: [-9.5771% -5.5606% -1.7496%] (p = 0.01 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/131072
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/131072: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/131072: Collecting 100 samples in estimated 5.7595 s (600 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/131072: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/131072
                        time:   [9.4923 ms 9.6384 ms 9.7949 ms]
                        change: [-5.7956% -2.7784% +0.1900%] (p = 0.08 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/262144
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/262144: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/262144: Collecting 100 samples in estimated 6.5137 s (300 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/262144: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/262144
                        time:   [21.344 ms 21.634 ms 21.931 ms]
                        change: [-19.379% -16.122% -12.667%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/524288
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/524288: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/524288: Collecting 100 samples in estimated 9.4797 s (200 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/524288: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/524288
                        time:   [47.809 ms 48.882 ms 50.051 ms]
                        change: [-11.130% -8.0677% -4.7549%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/1048576
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/1048576: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.3s, or reduce sample count to 50.
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/1048576: Collecting 100 samples in estimated 9.3142 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/1048576: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/1048576
                        time:   [91.174 ms 92.364 ms 93.596 ms]
                        change: [-10.431% -8.5880% -6.7047%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/2097152
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/2097152: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 20.2s, or reduce sample count to 20.
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/2097152: Collecting 100 samples in estimated 20.218 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/2097152: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/2097152
                        time:   [192.43 ms 194.44 ms 196.56 ms]
                        change: [-10.448% -9.1806% -7.8372%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/4194304
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/4194304: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 38.1s, or reduce sample count to 10.
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/4194304: Collecting 100 samples in estimated 38.124 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/4194304: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/4194304
                        time:   [379.72 ms 381.95 ms 384.38 ms]
                        change: [-15.891% -15.261% -14.629%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/32768
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/32768: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/32768: Collecting 100 samples in estimated 5.0694 s (2100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/32768: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/32768
                        time:   [2.4197 ms 2.4658 ms 2.5184 ms]
                        change: [+5.6856% +8.6309% +11.566%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/65536
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/65536: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/65536: Collecting 100 samples in estimated 5.2877 s (1200 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/65536: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/65536
                        time:   [4.4157 ms 4.5204 ms 4.6347 ms]
                        change: [-2.8322% +0.5157% +4.0209%] (p = 0.77 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/131072
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/131072: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/131072: Collecting 100 samples in estimated 5.4422 s (600 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/131072: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/131072
                        time:   [9.0741 ms 9.2183 ms 9.3674 ms]
                        change: [-1.7590% +1.2289% +4.0510%] (p = 0.41 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/262144
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/262144: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/262144: Collecting 100 samples in estimated 6.5420 s (300 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/262144: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/262144
                        time:   [21.510 ms 21.792 ms 22.091 ms]
                        change: [+0.6440% +3.0114% +5.3517%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/524288
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/524288: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/524288: Collecting 100 samples in estimated 9.6532 s (200 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/524288: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/524288
                        time:   [47.697 ms 48.500 ms 49.346 ms]
                        change: [-5.0135% -2.0695% +0.7938%] (p = 0.17 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/1048576
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/1048576: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.2s, or reduce sample count to 50.
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/1048576: Collecting 100 samples in estimated 9.2140 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/1048576: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/1048576
                        time:   [92.493 ms 93.593 ms 94.730 ms]
                        change: [-10.478% -7.9674% -5.6798%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/2097152
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/2097152: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 19.0s, or reduce sample count to 20.
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/2097152: Collecting 100 samples in estimated 18.992 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/2097152: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/2097152
                        time:   [190.88 ms 192.36 ms 193.91 ms]
                        change: [-12.597% -11.378% -10.180%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/4194304
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/4194304: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 39.3s, or reduce sample count to 10.
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/4194304: Collecting 100 samples in estimated 39.289 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/4194304: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/4194304
                        time:   [397.76 ms 400.90 ms 404.22 ms]
                        change: [-13.641% -12.583% -11.499%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

Benchmarking "bls12_381 - radix2" - coset_fft_in_place/32768
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/32768: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/32768: Collecting 100 samples in estimated 5.1805 s (1900 iterations)
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/32768: Analyzing
"bls12_381 - radix2" - coset_fft_in_place/32768
                        time:   [2.7607 ms 2.8130 ms 2.8719 ms]
                        change: [+9.3930% +11.864% +14.808%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/65536
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/65536: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/65536: Collecting 100 samples in estimated 5.2517 s (1000 iterations)
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/65536: Analyzing
"bls12_381 - radix2" - coset_fft_in_place/65536
                        time:   [4.9596 ms 5.0647 ms 5.1799 ms]
                        change: [+3.9856% +6.3693% +8.9648%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/131072
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/131072: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/131072: Collecting 100 samples in estimated 5.1420 s (500 iterations)
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/131072: Analyzing
"bls12_381 - radix2" - coset_fft_in_place/131072
                        time:   [10.354 ms 10.525 ms 10.709 ms]
                        change: [-20.452% -16.308% -11.850%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/262144
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/262144: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/262144: Collecting 100 samples in estimated 7.4327 s (300 iterations)
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/262144: Analyzing
"bls12_381 - radix2" - coset_fft_in_place/262144
                        time:   [22.631 ms 22.892 ms 23.160 ms]
                        change: [-9.1506% -6.2321% -3.4109%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Details: 8C16T.
I think this makes it good enough. I will now bench for smaller sizes non parallel.

@Pratyush @ValarDragon Do help corroborate the results as I saw that the results fluctuated a lot between runs.

There is no change in results for MNT. Not sure if this is to be expected.

jon-chuang · 2021-03-25T08:04:19Z

Here is the data for the serial case:

WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/32768
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/32768: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/32768: Collecting 100 samples in estimated 5.8394 s (600 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/32768: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/32768
                        time:   [9.7242 ms 9.7499 ms 9.7808 ms]
                        change: [+8.1595% +8.9352% +9.6607%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  7 (7.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/65536
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/65536: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/65536: Collecting 100 samples in estimated 6.2526 s (300 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/65536: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/65536
                        time:   [20.658 ms 20.737 ms 20.827 ms]
                        change: [+3.1853% +4.1361% +5.0873%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/131072
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/131072: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/131072: Collecting 100 samples in estimated 8.9820 s (200 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/131072: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/131072
                        time:   [44.775 ms 44.910 ms 45.046 ms]
                        change: [+4.9416% +5.7576% +6.5064%] (p = 0.00 < 0.05)
                        Performance has regressed.
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/262144
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/262144: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.8s, or reduce sample count to 50.
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/262144: Collecting 100 samples in estimated 9.7545 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/262144: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/262144
                        time:   [96.474 ms 96.656 ms 96.845 ms]
                        change: [-0.6048% -0.1526% +0.2655%] (p = 0.50 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/524288
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/524288: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 20.6s, or reduce sample count to 20.
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/524288: Collecting 100 samples in estimated 20.568 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/524288: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/524288
                        time:   [205.17 ms 205.65 ms 206.14 ms]
                        change: [-3.5415% -2.9891% -2.4759%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/1048576
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/1048576: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 43.5s, or reduce sample count to 10.
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/1048576: Collecting 100 samples in estimated 43.523 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/1048576: Analyzing
"bls12_381 - radix2" - subgroup_fft_in_place/1048576
                        time:   [432.84 ms 433.57 ms 434.31 ms]
                        change: [-10.785% -10.349% -9.9450%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/32768
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/32768: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/32768: Collecting 100 samples in estimated 5.0249 s (500 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/32768: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/32768
                        time:   [9.9924 ms 10.022 ms 10.056 ms]
                        change: [+4.1057% +4.6042% +5.1030%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/65536
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/65536: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/65536: Collecting 100 samples in estimated 6.3489 s (300 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/65536: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/65536
                        time:   [21.442 ms 21.525 ms 21.626 ms]
                        change: [+2.3683% +3.0082% +3.7269%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/131072
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/131072: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/131072: Collecting 100 samples in estimated 9.2127 s (200 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/131072: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/131072
                        time:   [45.954 ms 46.041 ms 46.136 ms]
                        change: [+0.2819% +1.0548% +1.7724%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/262144
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/262144: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.9s, or reduce sample count to 50.
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/262144: Collecting 100 samples in estimated 9.8725 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/262144: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/262144
                        time:   [99.226 ms 99.470 ms 99.726 ms]
                        change: [-2.2848% -1.6243% -1.0141%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/524288
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/524288: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 20.7s, or reduce sample count to 20.
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/524288: Collecting 100 samples in estimated 20.672 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/524288: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/524288
                        time:   [206.77 ms 207.12 ms 207.48 ms]
                        change: [-8.2831% -7.6417% -7.0368%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/1048576
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/1048576: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 43.9s, or reduce sample count to 10.
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/1048576: Collecting 100 samples in estimated 43.944 s (100 iterations)
Benchmarking "bls12_381 - radix2" - subgroup_ifft_in_place/1048576: Analyzing
"bls12_381 - radix2" - subgroup_ifft_in_place/1048576
                        time:   [439.90 ms 440.62 ms 441.34 ms]
                        change: [-12.364% -11.914% -11.519%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking "bls12_381 - radix2" - coset_fft_in_place/32768
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/32768: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/32768: Collecting 100 samples in estimated 5.4242 s (500 iterations)
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/32768: Analyzing
"bls12_381 - radix2" - coset_fft_in_place/32768
                        time:   [10.933 ms 10.960 ms 10.987 ms]
                        change: [+9.6970% +10.177% +10.644%] (p = 0.00 < 0.05)
                        Performance has regressed.
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/65536
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/65536: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/65536: Collecting 100 samples in estimated 6.9663 s (300 iterations)
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/65536: Analyzing
"bls12_381 - radix2" - coset_fft_in_place/65536
                        time:   [23.195 ms 23.261 ms 23.328 ms]
                        change: [+8.1506% +8.7586% +9.3377%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/131072
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/131072: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/131072: Collecting 100 samples in estimated 9.8730 s (200 iterations)
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/131072: Analyzing
"bls12_381 - radix2" - coset_fft_in_place/131072
                        time:   [49.664 ms 49.792 ms 49.921 ms]
                        change: [+7.0807% +7.5261% +7.9696%] (p = 0.00 < 0.05)
                        Performance has regressed.
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/262144
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/262144: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 10.7s, or reduce sample count to 40.
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/262144: Collecting 100 samples in estimated 10.745 s (100 iterations)
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/262144: Analyzing
"bls12_381 - radix2" - coset_fft_in_place/262144
                        time:   [106.18 ms 106.47 ms 106.78 ms]
                        change: [+0.2334% +0.7263% +1.1819%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/524288
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/524288: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 22.4s, or reduce sample count to 20.
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/524288: Collecting 100 samples in estimated 22.385 s (100 iterations)
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/524288: Analyzing
"bls12_381 - radix2" - coset_fft_in_place/524288
                        time:   [222.00 ms 222.47 ms 222.97 ms]
                        change: [-4.5002% -4.0088% -3.5321%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/1048576
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/1048576: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 47.5s, or reduce sample count to 10.
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/1048576: Collecting 100 samples in estimated 47.524 s (100 iterations)
Benchmarking "bls12_381 - radix2" - coset_fft_in_place/1048576: Analyzing
"bls12_381 - radix2" - coset_fft_in_place/1048576
                        time:   [470.46 ms 471.29 ms 472.14 ms]
                        change: [-8.8260% -8.4579% -8.1081%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/32768
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/32768: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/32768: Collecting 100 samples in estimated 5.3178 s (500 iterations)
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/32768: Analyzing
"bls12_381 - radix2" - coset_ifft_in_place/32768
                        time:   [10.651 ms 10.712 ms 10.783 ms]
                        change: [+4.0475% +4.8278% +5.6070%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/65536
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/65536: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/65536: Collecting 100 samples in estimated 6.8066 s (300 iterations)
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/65536: Analyzing
"bls12_381 - radix2" - coset_ifft_in_place/65536
                        time:   [22.396 ms 22.472 ms 22.559 ms]
                        change: [+2.2767% +2.8382% +3.4491%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/131072
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/131072: Warming up for 3.0000 s
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/131072: Collecting 100 samples in estimated 9.7565 s (200 iterations)
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/131072: Analyzing
"bls12_381 - radix2" - coset_ifft_in_place/131072
                        time:   [48.583 ms 48.761 ms 48.952 ms]
                        change: [+1.7305% +2.5542% +3.3143%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/262144
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/262144: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 10.4s, or reduce sample count to 40.
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/262144: Collecting 100 samples in estimated 10.438 s (100 iterations)
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/262144: Analyzing
"bls12_381 - radix2" - coset_ifft_in_place/262144
                        time:   [102.48 ms 102.67 ms 102.86 ms]
                        change: [-1.9921% -1.5773% -1.1758%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/524288
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/524288: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 21.6s, or reduce sample count to 20.
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/524288: Collecting 100 samples in estimated 21.610 s (100 iterations)
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/524288: Analyzing
"bls12_381 - radix2" - coset_ifft_in_place/524288
                        time:   [216.67 ms 217.11 ms 217.58 ms]
                        change: [-5.3576% -4.7837% -4.2304%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/1048576
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/1048576: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 45.9s, or reduce sample count to 10.
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/1048576: Collecting 100 samples in estimated 45.889 s (100 iterations)
Benchmarking "bls12_381 - radix2" - coset_ifft_in_place/1048576: Analyzing
"bls12_381 - radix2" - coset_ifft_in_place/1048576
                        time:   [456.91 ms 457.48 ms 458.09 ms]
                        change: [-12.902% -12.442% -12.016%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

I think the compaction copies could be made to be avoided in ththe cases when the problem size is small enough... but this comes at the risk of working for one set of params and not working for others... can't optimise everything perfectly, I guess.
More extensive benching should be done to include MNT etc.

I can't decide if I am more in favour of optimising the small FFTs or the big ones... the sad fact is that unless one has some instrumentation to do live AB testing, it's impossible to autotune everything.

I suppose that one can create an optimisation config file, while does conditional compilation based on a single file. The unfortunate ugly fact is that currently one must manage this with features. Although searching over such a large optimisation space is hard, there exist many techniques in the literature like bayesian optimisation which can quickly determine locally-optimal parameters from expensive data generation i.e. benching of downstream functions. Although ideally, one is able to compile inidividual highly optimised functions in a binary with separate configs, this itself is hard.

If only there were a way to directly access LLVM's optimisation infrastructure from ordinary rust code, that allows for such an optimisation process.

This would be a truly beautiful feature in my mind. e.g. https://www.cl.cam.ac.uk/~ey204/pubs/MPHIL/2017_SZYMON.pdf wrought large.

jon-chuang · 2021-03-25T08:26:03Z

Btw @ValarDragon , you do know that your benches were wrong, right? They did not include the cost of the roots compaction, due to the bug. So all those improvements that were stated are not proven. In fact, in the above benches, it is shown that it is much harder to find a suitable tradeoff.

Pratyush · 2021-03-25T08:28:48Z

I think the numbers for parallel realignment that @ValarDragon reported were from his old PR #177, which didn't have the bug.

CHANGELOG.md

jon-chuang · 2021-03-25T08:38:31Z

I think the numbers for parallel realignment that @ValarDragon reported were from his old PR #177, which didn't have the bug.

I'm not sure if I believe this. Could you confirm @ValarDragon ?

Co-authored-by: Pratyush Mishra <[email protected]>

jon-chuang · 2021-03-25T08:46:09Z

All I can think howetver, is that since reasonable circuits are about size at least 2^18, one should disregard the benchmarks for 2^17 and below the current PR is doing very well. Especially, one would expect even better improvements for large N i.e. 2^25

Pratyush · 2021-03-25T09:02:28Z

It seems like the FFT code is faster even at small sizes for a small number of cores?

4 threads

"bls12_381 - radix2" - subgroup_fft_in_place/32768                                                                             
                        time:   [4.9159 ms 4.9797 ms 5.0551 ms]
                        change: [-10.802% -8.3784% -5.8054%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe
"bls12_381 - radix2" - subgroup_fft_in_place/65536                                                                            
                        time:   [10.724 ms 10.784 ms 10.851 ms]
                        change: [-6.0406% -5.4021% -4.7377%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe
"bls12_381 - radix2" - subgroup_fft_in_place/131072                                                                            
                        time:   [22.814 ms 22.908 ms 23.003 ms]
                        change: [-5.7885% -5.2133% -4.5943%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
"bls12_381 - radix2" - subgroup_fft_in_place/262144                                                                            
                        time:   [46.696 ms 47.291 ms 47.836 ms]
                        change: [-13.601% -12.393% -11.245%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  14 (14.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/524288: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.7s, or reduce sample count to 50.
"bls12_381 - radix2" - subgroup_fft_in_place/524288                                                                            
                        time:   [94.261 ms 94.586 ms 94.937 ms]
                        change: [-12.658% -12.106% -11.563%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/1048576: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 18.5s, or reduce sample count to 20.
"bls12_381 - radix2" - subgroup_fft_in_place/1048576                                                                            
                        time:   [184.78 ms 185.32 ms 185.85 ms]
                        change: [-15.254% -14.912% -14.566%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking "bls12_381 - radix2" - subgroup_fft_in_place/2097152: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 42.3s, or reduce sample count to 10.
"bls12_381 - radix2" - subgroup_fft_in_place/2097152                                                                            
                        time:   [435.73 ms 438.96 ms 442.23 ms]
                        change: [-25.799% -25.038% -24.289%] (p = 0.00 < 0.05)
                        Performance has improved.

jon-chuang · 2021-03-25T09:08:37Z

@Pratyush , yes, that is expected, as the code worries less about partitioning the data into subsets of reasonable size.

poly/src/domain/radix2/fft.rs

Pratyush · 2021-03-30T12:55:54Z

poly/src/domain/radix2/fft.rs

+        let compaction_size = core::cmp::min(
+            roots_cache.len() / 2,
+            roots_cache.len() / MIN_COMPACTION_CHUNKS,
+        );


What happens when roots_cache.len() is 2 or less than MIN_COMPACTION_CHUNKS? Could you amend the tests to check that as well? Thanks!

Hmm the compaction wouldn't happen. So we don't have to worry about it. Notice that cmp::min is only necessary for MIN_COMPACTION_CHUNKS = 1, since chunks > 0.

If roots_cache.len() < MIN_COMPACTION_CHUNKS, then chunks <= xi.len() / 2 = roots_cache.len() < MIN_COMPACTION_CHUNKS

Ok sounds good, a comment to that effect would be great.

poly/src/domain/radix2/fft.rs

Pratyush · 2021-03-30T13:35:48Z

This looks great. In terms of refactoring, there's still some common code between io_helper and oi_helper, namely

algebra/poly/src/domain/radix2/fft.rs

Lines 186 to 202 in a03c4a0

    
           cfg_chunks_mut!(xi, chunk_size).for_each(|cxi| { 
        
               let (lo, hi) = cxi.split_at_mut(gap); 
        
               // If the chunk is sufficiently big that parallelism helps, 
        
               // we parallelize the butterfly operation within the chunk. 
        
               if gap > MIN_PROBLEM_SIZE && num_chunks < max_threads { 
        
                   cfg_iter_mut!(lo) 
        
                       .zip(cfg_iter_mut!(hi)) 
        
                       .zip(cfg_iter!(roots).step_by(step)) 
        
                       .for_each(Self::butterfly_fn_io); 
        
               } else { 
        
                   lo.iter_mut() 
        
                       .zip(hi) 
        
                       .zip(roots.iter().step_by(step)) 
        
                       .for_each(Self::butterfly_fn_io); 
        
               } 
        
           });

and

algebra/poly/src/domain/radix2/fft.rs

Lines 239 to 255 in a03c4a0

    
           cfg_chunks_mut!(xi, chunk_size).for_each(|cxi| { 
        
               let (lo, hi) = cxi.split_at_mut(gap); 
        
               // If the chunk is sufficiently big that parallelism helps, 
        
               // we parallelize the butterfly operation within the chunk. 
        
               if gap > MIN_PROBLEM_SIZE && num_chunks < max_threads { 
        
                   cfg_iter_mut!(lo) 
        
                       .zip(cfg_iter_mut!(hi)) 
        
                       .zip(cfg_iter!(roots).step_by(step)) 
        
                       .for_each(Self::butterfly_fn_oi); 
        
               } else { 
        
                   lo.iter_mut() 
        
                       .zip(hi) 
        
                       .zip(roots.iter().step_by(step)) 
        
                       .for_each(Self::butterfly_fn_oi); 
        
               } 
        
           });

I think we should extract these into a common method as well, so that really the only thing different between the two is in in the root compaction.

jon-chuang · 2021-03-30T13:41:35Z

This looks great. In terms of refactoring, there's still some common code between io_helper and oi_helper, namely

algebra/poly/src/domain/radix2/fft.rs

Line 186 in a03c4a0

cfg_chunks_mut!(xi, chunk_size).for_each(|cxi| {

to

algebra/poly/src/domain/radix2/fft.rs

Line 202 in a03c4a0

});

and

algebra/poly/src/domain/radix2/fft.rs

Line 239 in a03c4a0

cfg_chunks_mut!(xi, chunk_size).for_each(|cxi| {

to

algebra/poly/src/domain/radix2/fft.rs

Line 255 in a03c4a0

});

I think we should extract these into a common method as well, so that really the only thing different between the two is in in the root compaction.

Not sure about passing a function pointer though...then one instantiates for both anyway... Rather than inlining sadly

Pratyush · 2021-03-30T13:42:35Z

I think the method should be inlined anyway? As long as you use generics

jon-chuang · 2021-03-30T13:44:03Z

Oh yes, let me do that

ValarDragon · 2021-03-30T14:44:56Z

Just benchmarked the FFT on a 48 logical core (24 physical core) machine. It was a 1-2% slowdown until an FFT of size 2^18, after which it provided speedups.

I'm going to try to benchmark on a clean 8 core laptop instance tomorrow, just to see how it performs for laptop provers' cache settings

jon-chuang · 2021-03-30T14:55:16Z

@ValarDragon cool! May I know if this was against the master or by setting the compaction threshold to be impossibly large?
Were the speedups pronounced for large N? Or did you stop measuring at 2^21?

I find that even on a 8C16T the speedup increases further for 2^23. I was hoping to get data on very large N and many cores.

ValarDragon · 2021-03-30T15:05:30Z

That was measured against master, starting from 2^12 sized FFTs, and ranging until 2^22 sized. I'll measure up to higher sizes

poly/src/domain/radix2/fft.rs

CHANGELOG.md

Pratyush · 2021-03-30T17:53:24Z

This LGTM modulo the last two comments above, and pending @ValarDragon's benchmark

ValarDragon · 2021-04-08T21:00:11Z

Confirmed I got speedups on an 8 core laptop as well for relevant size range. (2^12 up to 2^21), glad this speedup got in!

jon-chuang added 4 commits March 24, 2021 18:28

stash

268ba1d

parallelise root compaction

92e913d

cache local ifft

ea8d40b

Merge branch 'master' into jonch/fft-opt

4c914cc

jon-chuang requested review from ValarDragon and Pratyush and removed request for ValarDragon March 24, 2021 12:24

jon-chuang added 2 commits March 24, 2021 20:31

uncomment tests

c3209e6

changelog

17c36fe

Pratyush reviewed Mar 24, 2021

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Pratyush reviewed Mar 24, 2021

View reviewed changes

poly/src/domain/radix2/fft.rs Outdated Show resolved Hide resolved

Pratyush reviewed Mar 24, 2021

View reviewed changes

poly/src/domain/radix2/fft.rs Outdated Show resolved Hide resolved

Pratyush reviewed Mar 24, 2021

View reviewed changes

poly/src/domain/radix2/fft.rs Outdated Show resolved Hide resolved

add comments, changelog

0fa0619

use subchunks only for very large gap

bb65472

Pratyush reviewed Mar 25, 2021

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Update CHANGELOG.md

f943e54

Co-authored-by: Pratyush Mishra <[email protected]>

Improve comments

a03c4a0

Pratyush reviewed Mar 30, 2021

View reviewed changes

poly/src/domain/radix2/fft.rs Outdated Show resolved Hide resolved

Pratyush reviewed Mar 30, 2021

View reviewed changes

poly/src/domain/radix2/fft.rs Outdated Show resolved Hide resolved

Pratyush reviewed Mar 30, 2021

View reviewed changes

poly/src/domain/radix2/fft.rs Outdated Show resolved Hide resolved

Pratyush reviewed Mar 30, 2021

View reviewed changes

poly/src/domain/radix2/fft.rs Outdated Show resolved Hide resolved

Pratyush reviewed Mar 30, 2021

View reviewed changes

poly/src/domain/radix2/fft.rs Outdated Show resolved Hide resolved

Extract into apply_butterfly

695ccc8

jon-chuang force-pushed the jonch/fft-opt branch from 7e6104a to c3eaafe Compare March 30, 2021 14:51

jon-chuang force-pushed the jonch/fft-opt branch from c3eaafe to 7bf765f Compare March 30, 2021 14:58

Improve comments

8d7ddfb

jon-chuang force-pushed the jonch/fft-opt branch from 7bf765f to 8d7ddfb Compare March 30, 2021 15:22

Pratyush reviewed Mar 30, 2021

View reviewed changes

poly/src/domain/radix2/fft.rs Show resolved Hide resolved

Pratyush reviewed Mar 30, 2021

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

jon-chuang and others added 3 commits March 31, 2021 21:03

minor changes

84ab1ae

Merge branch 'master' into jonch/fft-opt

62ac380

fix

27ee1e6

Pratyush approved these changes Apr 8, 2021

View reviewed changes

Pratyush changed the title ~~Cache Alignment for Serial and Parallel FFT & IFFT~~ Cache alignment for serial and parallel FFT and IFFT Apr 8, 2021

Pratyush merged commit e504bda into master Apr 8, 2021

Pratyush deleted the jonch/fft-opt branch April 8, 2021 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache alignment for serial and parallel FFT and IFFT #245

Cache alignment for serial and parallel FFT and IFFT #245

jon-chuang commented Mar 24, 2021 •

edited

Loading

Pratyush commented Mar 24, 2021

ValarDragon commented Mar 24, 2021 •

edited

Loading

jon-chuang commented Mar 25, 2021 •

edited

Loading

jon-chuang commented Mar 25, 2021

jon-chuang commented Mar 25, 2021 •

edited

Loading

jon-chuang commented Mar 25, 2021 •

edited

Loading

jon-chuang commented Mar 25, 2021 •

edited

Loading

Pratyush commented Mar 25, 2021 •

edited

Loading

jon-chuang commented Mar 25, 2021

jon-chuang commented Mar 25, 2021

Pratyush commented Mar 25, 2021 •

edited

Loading

jon-chuang commented Mar 25, 2021

Pratyush Mar 30, 2021

jon-chuang Mar 30, 2021 •

edited

Loading

Pratyush Mar 30, 2021

Pratyush commented Mar 30, 2021 •

edited

Loading

jon-chuang commented Mar 30, 2021

Pratyush commented Mar 30, 2021

jon-chuang commented Mar 30, 2021

ValarDragon commented Mar 30, 2021 •

edited

Loading

jon-chuang commented Mar 30, 2021

ValarDragon commented Mar 30, 2021 •

edited

Loading

Pratyush commented Mar 30, 2021

ValarDragon commented Apr 8, 2021 •

edited

Loading

Cache alignment for serial and parallel FFT and IFFT #245

Cache alignment for serial and parallel FFT and IFFT #245

Conversation

jon-chuang commented Mar 24, 2021 • edited Loading

Description

Pratyush commented Mar 24, 2021

ValarDragon commented Mar 24, 2021 • edited Loading

jon-chuang commented Mar 25, 2021 • edited Loading

jon-chuang commented Mar 25, 2021

jon-chuang commented Mar 25, 2021 • edited Loading

jon-chuang commented Mar 25, 2021 • edited Loading

jon-chuang commented Mar 25, 2021 • edited Loading

Pratyush commented Mar 25, 2021 • edited Loading

jon-chuang commented Mar 25, 2021

jon-chuang commented Mar 25, 2021

Pratyush commented Mar 25, 2021 • edited Loading

jon-chuang commented Mar 25, 2021

Pratyush Mar 30, 2021

Choose a reason for hiding this comment

jon-chuang Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

Pratyush Mar 30, 2021

Choose a reason for hiding this comment

Pratyush commented Mar 30, 2021 • edited Loading

jon-chuang commented Mar 30, 2021

Pratyush commented Mar 30, 2021

jon-chuang commented Mar 30, 2021

ValarDragon commented Mar 30, 2021 • edited Loading

jon-chuang commented Mar 30, 2021

ValarDragon commented Mar 30, 2021 • edited Loading

Pratyush commented Mar 30, 2021

ValarDragon commented Apr 8, 2021 • edited Loading

jon-chuang commented Mar 24, 2021 •

edited

Loading

ValarDragon commented Mar 24, 2021 •

edited

Loading

jon-chuang commented Mar 25, 2021 •

edited

Loading

jon-chuang commented Mar 25, 2021 •

edited

Loading

jon-chuang commented Mar 25, 2021 •

edited

Loading

jon-chuang commented Mar 25, 2021 •

edited

Loading

Pratyush commented Mar 25, 2021 •

edited

Loading

Pratyush commented Mar 25, 2021 •

edited

Loading

jon-chuang Mar 30, 2021 •

edited

Loading

Pratyush commented Mar 30, 2021 •

edited

Loading

ValarDragon commented Mar 30, 2021 •

edited

Loading

ValarDragon commented Mar 30, 2021 •

edited

Loading

ValarDragon commented Apr 8, 2021 •

edited

Loading