-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCT configuration errors on Frontier with amdclanggpu when OMP_NUM_THREADS > 1 #6755
Comments
@trey-ornl Is this a known issue to you? |
Why would threading impact how autoconf queries the ftn compiler? |
@dqwu can you upload the |
|
Okay, the failure here is happening because
Without The The configure test shouldn't be confusing the device code linking step for an internal Fortran library linking step. I'll look to see if this is a custom test or standard one, and if we can steer it away from too aggressively scraping libXX.a |
That test is coming from autconf proper, and while I think it probably should be changed it's way out of the scope of what E3SM should do. Is there OpenMP Offload or OpenACC anywhere in the code that gets built for Frontier? If so, we need to find a way to handle this problem decisively. If not, you can probably just not have |
There is OpenACC in the code but we aren't currently building cases that use it on Frontier. I will ask around to see what our future plans are for those cases on Frontier. |
I have a broader question. Is anyone really using amdclanggpu for real runs on Frontier? |
Doesn't look like it which means we could also just ignore this. |
@abbotts , I am trying to implement what you explained for cases where E3SM does not have OpenMP Offload or OpenACC. The
Are you suggesting removing only the module load for craype-accel-amd-gfx90a, or should other changes be made as well? |
I suggest you only remove the module load. If all the code you want to run on the AMD GPUs on Frontier is contained in HIP or C++ files then the lines you point out in The failure is happening specifically in the logic autoconf uses to get Fortran compiler internal libraries that it needs to link, so those flags for the C++ and HIP compilers shouldn't trigger this configure error. |
@abbotts , After removing the module, this issue no longer occurred, and the model build completed successfully. Thank you! |
* Removed the module to resolve the issue: #6755 * May or may not need to be restored to support OpenMP Offload or OpenACC
This issue is reproducible with AMD compiler amdclanggpu (compiler amdclang works).
Not reproducible when OMP_NUM_THREADS = 1.
Steps to Reproduce on Frontier
MCT Configuration Errors
The text was updated successfully, but these errors were encountered: