GPU configuration: The problem is maybe -arch sm_13 instead of -arch sm_11 in the Makefile, please doublecheck #1199

trapprb8 · 2024-02-06T10:15:24Z

When I start a simulation in gpu mode I get the following error message:

Error in setConst_hprime_xx: invalid device symbol
The problem is maybe -arch sm_13 instead of -arch sm_11 in the Makefile, please doublecheck

I am trying to configure with a Quadro P4000, which should be Pascal architecture, and therefore cuda8 should be used in configuration I guess (according the overview in the makefile, see below)?

I used the following code:

$ ./configure FC=gfortran CC=gcc --with-mpi MPIFC=mpif90 USE_BUNDLED_SCOTCH=1 --with-cuda=cuda8 CUDA_LIB=/usr/local/cuda/lib64
$ make

Overview in makefile:

# CUDA architecture / code version
# Fermi   (not supported): -gencode=arch=compute_10,code=sm_10
# Tesla   (Tesla C2050, GeForce GTX 480): -gencode=arch=compute_20,code=sm_20
# Tesla   (cuda4, K10, Geforce GTX 650, GT 650m): -gencode=arch=compute_30,code=sm_30
# Kepler  (cuda5, K20) : -gencode=arch=compute_35,code=sm_35
# Kepler  (cuda6.5, K80): -gencode=arch=compute_37,code=sm_37
# Maxwell (cuda6.5+/cuda7, Quadro K2200): -gencode=arch=compute_50,code=sm_50
# Pascal  (cuda8,P100, GeForce GTX 1080, Titan): -gencode=arch=compute_60,code=sm_60
# Volta   (cuda9, V100): -gencode=arch=compute_70,code=sm_70
# Turing  (cuda10, T4, GeForce RTX 2080): -gencode=arch=compute_75,code=sm_75
# Ampere  (cuda11, A100, GeForce RTX 3080): -gencode=arch=compute_80,code=sm_80
# Hopper  (cuda12, H100): -gencode=arch=compute_90,code=sm_90

The text was updated successfully, but these errors were encountered:

danielpeter · 2024-02-07T00:20:59Z

the Quadro P4000 has CUDA compute capability 6.1. that means you will likely have to modify the Makefile a bit after configuration and instead of

-gencode=arch=compute_60,code=sm_60

use:

-gencode=arch=compute_61,code=sm_61

trapprb8 · 2024-02-07T11:45:11Z

Thank you for your answer! :)
Unfortunately, that didn't work yet, the error stays the same.
What I did now was:

in Makefile.in:
GENCODE_60 = -gencode=arch=compute_61,code=\"sm_61,compute_61\"
in Makefile:
GENCODE_60 = -gencode=arch=compute_61,code=\"sm_61,compute_61\"
GENCODE = $(GENCODE_60) $(FC_DEFINE)GPU_DEVICE_Pascal #this line stays same, just wanted to show for completion
and run
$ ./configure FC=gfortran CC=gcc --with-mpi MPIFC=mpif90 USE_BUNDLED_SCOTCH=1 --with-cuda=cuda8 CUDA_LIB=/usr/local/cuda/lib64
$ make

danielpeter · 2024-02-07T14:34:37Z

great, thanks for the quick feedback!

note that the Makefile gets created by running the ./configure script. so, you would only need to either modify the Makefile.in before running the configuration, of the Makefile after running the configuration.

trapprb8 · 2024-02-07T15:13:48Z

Hi Daniel, thanks again! :)
I also did this, however it does not work. Still the same error.
We are only talking about the Makefile.in and Makefile in the main directory, right?
I uploaded the two files:

Makefile.txt
Makefile.in.txt

danielpeter · 2024-02-07T15:50:10Z

yes, the GPU architecture is specified only in the main Makefiles in the root directory, Makefile.in and the generated one Makefile.

can you be more specific what did not work, the compilation even with the modifications as you suggested, or the modification of only one of the Makefiles? that is, do you still get the error

Error in setConst_hprime_xx: invalid device symbol

even with the modification

GENCODE_60 = -gencode=arch=compute_61,code=\"sm_61,compute_61\"

in these Makefiles? if so, then what are your CUDA toolkit and CUDA driver versions?

trapprb8 · 2024-02-07T16:23:46Z

Exactly, the error is the same as before:
Error in setConst_hprime_xx: invalid device symbol The problem is maybe -arch sm_13 instead of -arch sm_11 in the Makefile, please doublecheck
The Cuda version is 11.8, nvcc --version gives me:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0

danielpeter · 2024-02-07T19:34:26Z

could you also add the output of the command nvidia-smi to see the driver version on your system?

trapprb8 · 2024-02-08T08:31:24Z

This output is:


+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P4000        On   | 00000000:05:00.0  On |                  N/A |
| 46%   30C    P0    28W / 105W |    240MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2980      G   /usr/lib/xorg/Xorg                192MiB |
|    0   N/A  N/A      3511      G   cinnamon                           30MiB |
|    0   N/A  N/A      4673      G   /usr/lib/firefox/firefox           13MiB |
+-----------------------------------------------------------------------------+

danielpeter · 2024-02-09T13:07:34Z

tricky... according to the toolkit documentation, that driver version looks okay for CUDA 11.8 and it should support the compute capability 6.1. unfortunately, I can't reproduce it as I don't have access to such a GPU card. the code works however on most older and newer cards, so I would expect this to be a driver version and CUDA toolkit issue.

to double check the compute capability of your card, could you compile and run the little helper tool in utils/GPU_tools/ folder on your system:

cd ~/<specfem-directory>/utils/GPU_tools/
nvcc --gpu-architecture=sm_60 -o check_cuda_device check_cuda_device.cu
./check_cuda_device

the tool will provide an info output with the compute capability listed.

in the past CIG-seismo forum somebody was able to run the code on a Quadro P6000, I think with a CUDA 9.1 version. you could try to downgrade CUDA driver & runtime version to see if this solves the issue.

trapprb8 · 2024-02-09T16:33:18Z

Hi dear,

here is the output of the helper tool:

``
found number of CUDA devices = 1

GPU device id: 0

Device Name = Quadro P4000

memory:
totalGlobalMem (in MB, dividing by powers of 1024): 8116.562500
totalGlobalMem (in GB, dividing by powers of 1024): 7.926331

totalGlobalMem (in MB, dividing by powers of 1000): 8510.833008
totalGlobalMem (in GB, dividing by powers of 1000): 8.510833

sharedMemPerBlock (in bytes): 49152

blocks:
Maximum number of registers per block: 65536
Maximum number of threads per block: 1024
Maximum size of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535

features:
Compute capability of the device = 6.1
multiProcessorCount: 14
canMapHostMemory: TRUE
deviceOverlap: TRUE

0: GPU memory usage (dividing by powers of 1024): used = 319.625000 MB, free = 7796.937500 MB, total = 8116.562500 MB

0: GPU memory usage (dividing by powers of 1000): used = 335.151104 MB, free = 8175.681536 MB, total = 8510.832640 MB

number of total devices: 1
``

Ok.. Maybe I will try to downgrade the Cuda Toolkit then!

Zzhe0315-RA · 2024-11-12T18:55:42Z

Tricky...Is this problem solved?

Zzhe0315-RA · 2024-11-12T18:57:17Z

Tricky...Is this problem solved?

I have the same problem as you, and try to the same way to deal it , but i faild.

trapprb8 · 2024-11-13T08:00:18Z

Tricky...Is this problem solved?

I have the same problem as you, and try to the same way to deal it , but i faild.

Hey:) No, unfortunately it was not solved..

danielpeter mentioned this issue Feb 7, 2024

The problem is maybe -arch sm_13 instead of -arch sm_11 in the Makefile, please doublecheck SPECFEM/specfem3d#1661

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU configuration: The problem is maybe -arch sm_13 instead of -arch sm_11 in the Makefile, please doublecheck #1199

GPU configuration: The problem is maybe -arch sm_13 instead of -arch sm_11 in the Makefile, please doublecheck #1199

trapprb8 commented Feb 6, 2024 •

edited

Loading

danielpeter commented Feb 7, 2024

trapprb8 commented Feb 7, 2024 •

edited

Loading

danielpeter commented Feb 7, 2024

trapprb8 commented Feb 7, 2024 •

edited

Loading

danielpeter commented Feb 7, 2024

trapprb8 commented Feb 7, 2024

danielpeter commented Feb 7, 2024

trapprb8 commented Feb 8, 2024 •

edited by danielpeter

Loading

danielpeter commented Feb 9, 2024

trapprb8 commented Feb 9, 2024 •

edited

Loading

Zzhe0315-RA commented Nov 12, 2024

Zzhe0315-RA commented Nov 12, 2024

trapprb8 commented Nov 13, 2024

GPU configuration: The problem is maybe -arch sm_13 instead of -arch sm_11 in the Makefile, please doublecheck #1199

GPU configuration: The problem is maybe -arch sm_13 instead of -arch sm_11 in the Makefile, please doublecheck #1199

Comments

trapprb8 commented Feb 6, 2024 • edited Loading

danielpeter commented Feb 7, 2024

trapprb8 commented Feb 7, 2024 • edited Loading

danielpeter commented Feb 7, 2024

trapprb8 commented Feb 7, 2024 • edited Loading

danielpeter commented Feb 7, 2024

trapprb8 commented Feb 7, 2024

danielpeter commented Feb 7, 2024

trapprb8 commented Feb 8, 2024 • edited by danielpeter Loading

danielpeter commented Feb 9, 2024

trapprb8 commented Feb 9, 2024 • edited Loading

Zzhe0315-RA commented Nov 12, 2024

Zzhe0315-RA commented Nov 12, 2024

trapprb8 commented Nov 13, 2024

trapprb8 commented Feb 6, 2024 •

edited

Loading

trapprb8 commented Feb 7, 2024 •

edited

Loading

trapprb8 commented Feb 7, 2024 •

edited

Loading

trapprb8 commented Feb 8, 2024 •

edited by danielpeter

Loading

trapprb8 commented Feb 9, 2024 •

edited

Loading