-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMREX Cuda issue with Cuda 11.2/11.3/11.6/11.7 #3598
Comments
Could you provide more information? What does What does What does What does How do you build the test? If you do this in
What do you get in The error message says "See Backtrace.0 file for details". Could we see that file? |
Hi WeiqunZhang, Below are the requested outputs output of nvidia-smi Free Git log git diff head make.ou Backtrace.0 |
Could you run the two git commands in amrex directory? I am trying to see which version of amrex you are using and whether there are any local changes. From the backtrace file, it seems that it dies at the first gpu kernel. The issue might be the driver is incompatible with the cuda toolkit. https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#id7. |
@nishaag your drivers are too old. For 11.2 for instance, you need at least 460.27.04. There is some compatibility that relaxes this strict constraint since the 455.23.05 driver series (~CUDA 11.1+), but your currently installed system drivers are also too old for that. |
@WeiqunZhang If it is due to the incompatible driver with Cuda toolkit ....it should work with Cuda toolkit 11.0 with the driver (450.236.xx that we have on the system. But unfortunately, it is not working . Git commands output in AMREX directory git log n git diff HEAD |
@WeiqunZhang If it is due to the incompatible driver with Cuda toolkit ....it should work with Cuda toolkit 11.0 with the driver (450.236.xx that we have on the system. But unfortunately, it is not working |
@ax3l If it is due to the incompatible driver with Cuda toolkit ....it should work with Cuda toolkit 11.0 with the driver (450.236.xx that we have on the system. But unfortunately, it is not working . |
I don't have any explanation. |
hello, I encountered the same error in CUDA version 11.7. Have you solved it? |
First of all, I don't believe the issue is in AMReX. I have tested the current amrex on various machines with CUDA 11.x and 12.x. They all work just fine. On my workstation using Ubuntu, I have had various issues with the CUDA installation in the past. Sometimes a simple reboot could resolve the issue. Sometimes upgrading CUDA helped. Sometimes, I had to remove all packages containing the word nvidia or cuda, and then reinstall CUDA. This last resort has always worked for me. |
Hi SuperDNY, I observed that the issue is with the NVIDIA device driver version available on the system, it is not working with the driver version 450.236.xxx but it is working with the driver version 470.xxx |
I built the AMReX/amrex/Tests/GPU/Vector code for NVIDIA A100 GPU with the command make CUDA_ARCH=80, it built successfully but threw the below error at runtime. I tried with the CUDA version 11.2/11.3/11.6/11.7 but everytime facing the same issue , Please help in this regard
[/AMReX/amrex/Tests/GPU/Vector]$ ./main3d.gnu.CUDA.ex inputs
Initializing CUDA...
CUDA initialized with 1 device.
amrex::Abort::0::GPU last error detected in file ../../..//Src/Base/AMReX_GpuLaunchFunctsG.H line 885: invalid argument !!!
SIGABRT
See Backtrace.0 file for details
(cuda-11.7) aglnisha@scn37-mn:~/AMReX/amrex/Tests/GPU/Vector$ exit
exit
@nishaag
The text was updated successfully, but these errors were encountered: