Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix degree bug, and add benchmarks of polynomial add/sub/neg #119

Merged
merged 7 commits into from
Dec 8, 2020

Conversation

ValarDragon
Copy link
Member

@ValarDragon ValarDragon commented Dec 7, 2020

Description

Parallelize Add, Sub, AddAssign, SubAssign, Neg operations on DensePolynomial.

Furthermore fix bugs in degree calculation when you have dense polynomial addition / subtraction that causes the leading coefficient to be 0.


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (main)
  • Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
  • Wrote unit tests
  • Updated relevant documentation in the code
  • Added a relevant changelog entry to the Pending section in CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer

@Pratyush
Copy link
Member

Pratyush commented Dec 7, 2020

Hmm I wonder if parallelizing these cheap operations would give better speedups if we used this to increase the amount of computation on each thread

@ValarDragon
Copy link
Member Author

ValarDragon commented Dec 8, 2020

I'll add benchmarks to see how it behaves with and without that.

@ValarDragon
Copy link
Member Author

ValarDragon commented Dec 8, 2020

Based on benchmarks for small sized polynomials, definitely appears like we should be using that for small polynomials. Results of parallel addition speeds on the benchmark server:

"bls12_381" - add_polynomial/4
                        time:   [220.19 us 221.88 us 223.44 us]
"bls12_381" - add_polynomial/16
                        time:   [283.37 us 285.74 us 287.92 us]
"bls12_381" - add_polynomial/64
                        time:   [191.32 us 195.84 us 199.98 us]
"bls12_381" - add_polynomial/256
                        time:   [156.59 us 157.25 us 157.87 us]
"bls12_381" - add_polynomial/4096
                        time:   [308.35 us 313.17 us 317.55 us]
"bls12_381" - add_polynomial/1024
                        time:   [214.63 us 216.30 us 217.81 us]                                                          
"bls12_381" - add_polynomial/16384
                        time:   [570.16 us 572.52 us 574.73 us]
"bls12_381" - add_polynomial/65536
                        time:   [1.0938 ms 1.1022 ms 1.1127 ms]
"bls12_381" - add_polynomial/262144
                        time:   [3.6081 ms 3.6522 ms 3.6979 ms]

For reference, heres the serial times:

"bls12_381" - add_polynomial/4
                        time:   [37.483 ns 38.156 ns 38.833 ns]
"bls12_381" - add_polynomial/16
                        time:   [76.012 ns 77.167 ns 78.320 ns]
"bls12_381" - add_polynomial/64
                        time:   [272.49 ns 274.50 ns 276.44 ns]
"bls12_381" - add_polynomial/256
                        time:   [1.0792 us 1.0849 us 1.0905 us]
"bls12_381" - add_polynomial/1024
                        time:   [4.5605 us 4.5762 us 4.5916 us]
"bls12_381" - add_polynomial/4096
                        time:   [23.736 us 23.813 us 23.894 us]
"bls12_381" - add_polynomial/16384
                        time:   [156.44 us 157.11 us 158.31 us]
"bls12_381" - add_polynomial/65536
                        time:   [725.35 us 726.04 us 726.87 us]
"bls12_381" - add_polynomial/262144
                        time:   [3.2842 ms 3.2916 ms 3.3022 ms]

Should we block merging this PR until we update ark_std for this? The overhead of parallelization here is huge at small sizes

@ValarDragon
Copy link
Member Author

ValarDragon commented Dec 8, 2020

with_min_len=64 benchmarks:
(we get similar numbers with with_min_len=4096, alot of the overhead is just due to attempting parallelism I guess)

"bls12_381" - add_polynomial/1
                        time:   [95.391 us 96.408 us 97.278 us]
                        change: [-9.8377% -8.8342% -7.9433%] (p = 0.00 < 0.05)
"bls12_381" - add_polynomial/4
                        time:   [174.42 us 175.61 us 176.70 us]
                        change: [-11.489% -9.2890% -7.0652%] (p = 0.00 < 0.05)
bls12_381" - add_polynomial/16
                        time:   [223.92 us 225.63 us 227.20 us]
                        change: [-19.895% -18.767% -17.660%] (p = 0.00 < 0.05)
"bls12_381" - add_polynomial/64
                        time:   [212.25 us 216.25 us 219.94 us]
                        change: [+10.884% +13.254% +15.973%] (p = 0.00 < 0.05)
"bls12_381" - add_polynomial/256
                        time:   [244.27 us 247.81 us 251.16 us]
                        change: [+53.078% +55.495% +57.799%] (p = 0.00 < 0.05)
"bls12_381" - add_polynomial/1024
                        time:   [181.77 us 182.84 us 183.86 us]
                        change: [-15.485% -14.604% -13.712%] (p = 0.00 < 0.05)
"bls12_381" - add_polynomial/4096
                        time:   [199.15 us 202.84 us 206.36 us]
                        change: [-36.096% -34.938% -33.571%] (p = 0.00 < 0.05)                                                          
"bls12_381" - add_polynomial/16384
                        time:   [384.39 us 387.86 us 391.42 us]
                        change: [-33.035% -31.715% -30.393%] (p = 0.00 < 0.05)
"bls12_381" - add_polynomial/65536
                        time:   [965.96 us 975.14 us 985.47 us]
                        change: [-18.815% -14.592% -10.073%] (p = 0.00 < 0.05)                                                         
"bls12_381" - add_polynomial/262144
                        time:   [3.1964 ms 3.2494 ms 3.3045 ms]

@ValarDragon
Copy link
Member Author

ValarDragon commented Dec 8, 2020

Hrmm, looks like with_min_len doesn't really fix the problem.

We should probably change add an ark_std macro for cfg_iter_with_min_size!(v, minimum_length) being equivalent to

#[cfg[feature(parallel)]]
if v.len() > minimum_length {
     cfg_iter!(v).with_min_len(minimum_length)
}
else {
    sequential code
}

And the minimum length being really large (like 2**20)?

Alternatively I can just undo the parallelization in this PR, and just copy over polynomial benchmark infra, and bug fixes

@ValarDragon
Copy link
Member Author

Something seems very wrong about these add_polynomial benchmarks. Polynomial evaluation which is N multiplications is 8x faster

"bls12_381" - add_polynomial/262144
                        time:   [3.1964 ms 3.2494 ms 3.3045 ms]
"bls12_381" - evaluate_polynomial/262144
                        time:   [406.14 us 408.14 us 410.51 us]

@Pratyush
Copy link
Member

Pratyush commented Dec 8, 2020

Hmm I think the overhead of parallelization is causing issues here.

I think for these ops it's fine to leave them as not parallel, because they'll be dominated by other stuff anyway.

@ValarDragon
Copy link
Member Author

Yeah, that sounds good to me. I'll remove the parallelization, but keep the bug fixes + benchmarks

.zip(&self.coeffs)
.for_each(|(a, b)| {
*a += b;
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I undo changes like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah this is fine

@Pratyush Pratyush changed the title Fix degree bug, and add parallel implementations of add/sub/neg Fix degree bug, and add benchmarks of polynomial add/sub/neg Dec 8, 2020
@Pratyush Pratyush merged commit 651a557 into master Dec 8, 2020
@Pratyush Pratyush deleted the parallelize_dense_poly_ops branch December 8, 2020 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants