-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/madd #233
Feature/madd #233
Conversation
@peytondmurray Have you ran unit tests ( |
The first time I ran Here is the benchmark result: benchmark.txt |
@peytondmurray The bottom line is don't bother with the What I would like to ask you to do is to run bench without you patches to figure out any performance benefits. |
@godsic The benchmark uses the Heun solver, which is unaffected by this change. I instead ran the benchmark for each of the four different solvers above for both the |
@peytondmurray Would be possible for you to run the benchmark again? We are about to release 3.10 and I am happy to merge this one if it provides performance benefits. |
…there is any noticeable improvement.
…ing the new cuda.madd functions
d877b38
to
53a0217
Compare
@godsic Sorry to be slow - I'm preparing the benchmarks to run overnight, and will post results tomorrow. |
@godsic I rebased
The benchmarks were run on a GTX 1080 Ti. After running the simulations, I plotted the execution times via matplotlib; the first row is a scatter plot of all the raw execution times as a function of the number of cells N; the second row is the mean execution time as a function of N; and the final row is the ratio of the mean execution time on the And here are the raw execution times I recorded, formatted into csv files: Overall it looks like |
This script is used for the benchmarks mentioned in pull request #233. This script does not actively check anything, it takes a long time to run, and has no longer any purpose. There is no reason to have this as a 'unit test' in the test directory. Hence the file is removed.
The RK23, RK4, RK45, and RK56 solvers make use of the Madd2, Madd3, Madd4, Madd5, Madd6, and Madd7 functions, but only Madd2 and Madd3 are implemented as cuda kernels. The rest of these functions essentially just call nested combinations of Madd2 and Madd3 multiple times. At each timestep, the solvers are therefore launching more cuda kernels than needed each time Madd4, Madd5, Madd6, and Madd7 are being called. As I understand it, the overhead associated with launching cuda kernels can be large, and to a lesser extent there's also an overhead to calling Go functions.
I implemented cuda versions of Madd4, Madd5, Madd6, and Madd7, and modified the solvers to use these functions. The simple benchmark included in the test folder (sp4_madd_bench.mx3) shows basically no improvement for RK23 and RK4, but there is a few percent (~5%) improvement in the time it takes for
run()
to finish on my machine for RK45 and RK56.