Why my example runs successfully on 4 computational nodes but failed on 8 computational nodes when I would like to use the capability of GPU Aware-MPI? #609
Unanswered
Terence-iscas
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I'm running my pipe example which has nearly 650K hexahedral elements (PolynomialOrder=7, NekRS-23.0). Each of my computational node has 4 CPUs and 4 AMD GPUs. I have used 4 computational nodes performing this example where openmpi-4.1.5 with ucx works correctly, because when timing gs, pw+early+device, pw+device and pw+host were all appeared.
I would like to test the strong scalability of my example, so I used 8 computational nodes (same architecture) to run it. Unfortunately, it stopped at the first "timing gs: "
And the error log looks:
I have located the function:
The variable
oogs_mode_list
looks like having following 5 values:If I use openmpi with ucx, it seems like
gsMode=OOGS_AUTO
, then the program will test the bandwith of communication. On the other hand, I force the value ofgsMode
to be other four options. When I tryOOGS_DEVICEMPI
, the same error occurred again. So my question is whether it exists bugs here? Or my mpi/ucx parameters setting was wrong (see below)?Launch Command:
Besides ,I have read the similar topic #594 #578 #568 , it gave me some mind but didn't work, so I launch this topic.
Thank you in advance for your kindly help!Best Wishes!
Beta Was this translation helpful? Give feedback.
All reactions