-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithreaded array initialization #68
base: main
Are you sure you want to change the base?
Conversation
Thanks for the contribution. I guess having something in PS for the Threads backend to control pinning and threads to cores mapping (or have an close to optimal default solution) would be great! Especially for AMD cpus with many NUMA regions where this becomes significant. |
BTW, @omlins, depending on how easy/difficult it would be to give me test access to Piz Daint I could run some benchmarks there as well. |
@carstenbauer, as Ludovic told you probably already, Piz Daint does not have any AMD CPUs. Thus, for testing this Superzack, Ludovic's cluster, will be better. |
I quickly tested another example, namely https://github.com/omlins/ParallelStencil.jl/blob/main/miniapps/acoustic3D.jl (with the visualization/animation part commented out. Same configuration as above, i.e. a 64 core node of Noctua 2 with 64 Julia threads that I pinned compactly. Below are the timings of the
This corresponds to about a 2.4x speedup. (cc @luraess) |
This relates also to #53 (comment) |
What's holding back merging this? |
Bump |
@omlins bump |
For better performance on systems with multiple NUMA domains. See my extensive comment on discourse.
With this PR, I get about 40% speedup for this example (with
USE_GPU=false
) when using a full AMD Zen3 CPU (64 cores, 4 NUMA domains) of Noctua 2.Timings (s) before
Timings (s) after
Speedup in %
NOTES:
@zeros
and co, analyze its structure and then initialize "accordingly". But that's difficult...)cc @luraess @omlins
PS: Working on it at the GPU4GEO Hackathon in the Schwarzwald 😉