Profiling MPI and benchmarking strong + weak scaling #3002
Replies: 7 comments
-
Happy to help. |
Beta Was this translation helpful? Give feedback.
-
Thanks @ali-ramadhan for doing this. I wonder if we could modify this script and run it on |
Beta Was this translation helpful? Give feedback.
-
Got some helpful replies from Julia Discourse: https://discourse.julialang.org/t/how-to-profile-julia-mpi-code/57136/4 Leading suggestion by @simonbyrne is to try using NVIDIA Nsight which might allow us to do GPU profiling and MPI profiling! |
Beta Was this translation helpful? Give feedback.
-
This registration is still open: https://portal.xsede.org/course-calendar/-/training-user/class/2310/session/3970 It's free and it'll happen on Thursday. I'm considering attending myself |
Beta Was this translation helpful? Give feedback.
-
Thanks for the heads up, just signed up! |
Beta Was this translation helpful? Give feedback.
-
Thanks, and me too! |
Beta Was this translation helpful? Give feedback.
-
@simone-silvestri has done a bit of this. @simone-silvestri feel free to post your results here. I'm converting this to a discussion. |
Beta Was this translation helpful? Give feedback.
-
In PR #590 I added a small/quick strong scaling test and @francispoulin calculated the scaling efficiency which wasn't super great:
I guess to improve performance we should do some MPI profiling to find bottlenecks. Could also benchmark the distributed pressure solve and the halo filling separately to see how they scale as well.
Might also make sense to benchmark scaling with
ShallowWaterModel
to see if it's anIncompressibleModel
issue. Might need a pretty large domain to see good scaling with a 2D shallow water model?@tomchor pointed out that the benchmark could be flawed. We should make sure everything is compiled. Could also try different sizes and a weak scaling benchmark in case the 1D/slab decomposition isn't helping.
Maybe trying on a different machine too. Not sure if there's a "proper" setup for doing these scaling benchmarks.
Bad scaling efficiency might also be a sign of missing barriers/waits?
@vchuravy We might ask for your help!
Beta Was this translation helpful? Give feedback.
All reactions