How to run Cardinal on GPU? #802
Closed
tranm-ansto
started this conversation in
General
Replies: 1 comment 13 replies
-
Can you tell me more about. how you tried to run on GPU? You will need to change the command you use to launch Cardinal. If you take a look at |
Beta Was this translation helpful? Give feedback.
13 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I'm trying to run the 'sfr_7pin' example (link below) on GPU.
https://github.com/neams-th-coe/cardinal/tree/devel/tutorials/sfr_7pin
I ran the example on CPU and it ran through fine. However, when I tried running on GPU, I got the error below. I'm new to Cardinal and NekRS. I'm not sure if there is additional stuff needed for running simulations on GPU. Thank you for your help.
Regards,
Minh
Terminal output below:
Authorization required, but no authorization protocol specified
Authorization required, but no authorization protocol specified
In UnstructuredMesh::stitch_meshes:
This mesh has 6 nodes on boundary
' (30000). Other mesh has 6 nodes on boundary
' (31000).Minimum edge length on both surfaces is 0.0011547.
In UnstructuredMesh::stitch_meshes:
Found 6 matching nodes.
In UnstructuredMesh::stitch_meshes:
This mesh has 6 nodes on boundary
' (30000). Other mesh has 6 nodes on boundary
' (31000).Minimum edge length on both surfaces is 0.0011547.
In UnstructuredMesh::stitch_meshes:
Found 6 matching nodes.
In UnstructuredMesh::stitch_meshes:
This mesh has 6 nodes on boundary
' (30000). Other mesh has 6 nodes on boundary
' (31000).Minimum edge length on both surfaces is 0.0011547.
In UnstructuredMesh::stitch_meshes:
Found 6 matching nodes.
In UnstructuredMesh::stitch_meshes:
This mesh has 6 nodes on boundary
' (30000). Other mesh has 6 nodes on boundary
' (31000).Minimum edge length on both surfaces is 0.0011547.
In UnstructuredMesh::stitch_meshes:
Found 6 matching nodes.
In UnstructuredMesh::stitch_meshes:
This mesh has 6 nodes on boundary
' (30000). Other mesh has 6 nodes on boundary
' (31000).Minimum edge length on both surfaces is 0.0011547.
In UnstructuredMesh::stitch_meshes:
Found 6 matching nodes.
In UnstructuredMesh::stitch_meshes:
This mesh has 6 nodes on boundary
' (30000). Other mesh has 6 nodes on boundary
' (31000).Minimum edge length on both surfaces is 0.0011547.
In UnstructuredMesh::stitch_meshes:
Found 6 matching nodes.
____ ___ / /__ / __ / /
/ __ \ / _ \ / //// // /_
/ / / // // ,< / , // /
// // __///||// ||/____/ v23.0.7 (907edeac)
COPYRIGHT (c) 2019-2023 UCHICAGO ARGONNE, LLC
MPI tasks: 8
Initializing device
*** ERROR ***
---[ Error ]--------------------------------------------------------------------
File : /data2/tranm/software/cardinal/contrib/nekRS/3rd_party/occa/src/occa/internal/modes/cuda/device.cpp
Line : 31
Function : device
Message : Device: Creating Device
CUDA Error [ 101 ]: CUDA_ERROR_INVALID_DEVICE
Stack
27 occa::error(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
26 occa::cuda::error(cudaError_enum, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
25 occa::cuda::device::device(occa::json const&)
24 occa::cuda::cudaMode::newDevice(occa::json const&)
23 occa::device::setup(occa::json const&)
22 occa::device::setup(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
21 device_t::device_t(setupAide&, comm_t&)
20 platform_t::platform_t(setupAide&, ompi_communicator_t*, ompi_communicator_t*)
19 nekrs::setup(ompi_communicator_t*, ompi_communicator_t*, int, int, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int, int, int)
18 NekInitAction::act()
17 Action::timedAct()
16 ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
15 ActionWarehouse::executeAllActions()
14 MooseApp::runInputFile()
13 MultiApp::createApp(unsigned int, double)
12 MultiApp::createLocalApp(unsigned int)
11 MultiApp::createApps()
10 FEProblemBase::addMultiApp(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, InputParameters&)
9 Action::timedAct()
8 ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
7 ActionWarehouse::executeAllActions()
6 MooseApp::runInputFile()
5 MooseApp::run()
4 /home/tranm/software/cardinal/cardinal-opt(main+0x551)
3 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)
2 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)
1 /home/tranm/software/cardinal/cardinal-opt(+0x104e5)
MPI_ABORT was invoked on rank 4 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
*** ERROR ***
---[ Error ]--------------------------------------------------------------------
File : /data2/tranm/software/cardinal/contrib/nekRS/3rd_party/occa/src/occa/internal/modes/cuda/device.cpp
Line : 31
Function : device
Message : Device: Creating Device
CUDA Error [ 101 ]: CUDA_ERROR_INVALID_DEVICE
Stack
27 occa::error(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
26 occa::cuda::error(cudaError_enum, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
25 occa::cuda::device::device(occa::json const&)
24 occa::cuda::cudaMode::newDevice(occa::json const&)
23 occa::device::setup(occa::json const&)
22 occa::device::setup(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
21 device_t::device_t(setupAide&, comm_t&)
20 platform_t::platform_t(setupAide&, ompi_communicator_t*, ompi_communicator_t*)
19 nekrs::setup(ompi_communicator_t*, ompi_communicator_t*, int, int, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int, int, int)
18 NekInitAction::act()
17 Action::timedAct()
16 ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
15 ActionWarehouse::executeAllActions()
14 MooseApp::runInputFile()
13 MultiApp::createApp(unsigned int, double)
12 MultiApp::createLocalApp(unsigned int)
11 MultiApp::createApps()
10 FEProblemBase::addMultiApp(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, InputParameters&)
9 Action::timedAct()
8 ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
7 ActionWarehouse::executeAllActions()
6 MooseApp::runInputFile()
5 MooseApp::run()
4 /home/tranm/software/cardinal/cardinal-opt(main+0x551)
3 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)
2 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)
1 /home/tranm/software/cardinal/cardinal-opt(+0x104e5)
*** ERROR ***
---[ Error ]--------------------------------------------------------------------
File : /data2/tranm/software/cardinal/contrib/nekRS/3rd_party/occa/src/occa/internal/modes/cuda/device.cpp
Line : 31
Function : device
Message : Device: Creating Device
CUDA Error [ 101 ]: CUDA_ERROR_INVALID_DEVICE
Stack
27 occa::error(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
26 occa::cuda::error(cudaError_enum, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
25 occa::cuda::device::device(occa::json const&)
24 occa::cuda::cudaMode::newDevice(occa::json const&)
23 occa::device::setup(occa::json const&)
22 occa::device::setup(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
21 device_t::device_t(setupAide&, comm_t&)
20 platform_t::platform_t(setupAide&, ompi_communicator_t*, ompi_communicator_t*)
19 nekrs::setup(ompi_communicator_t*, ompi_communicator_t*, int, int, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int, int, int)
18 NekInitAction::act()
17 Action::timedAct()
16 ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
15 ActionWarehouse::executeAllActions()
14 MooseApp::runInputFile()
13 MultiApp::createApp(unsigned int, double)
12 MultiApp::createLocalApp(unsigned int)
11 MultiApp::createApps()
10 FEProblemBase::addMultiApp(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, InputParameters&)
9 Action::timedAct()
8 ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
7 ActionWarehouse::executeAllActions()
6 MooseApp::runInputFile()
5 MooseApp::run()
4 /home/tranm/software/cardinal/cardinal-opt(main+0x551)
3 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)
2 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)
1 /home/tranm/software/cardinal/cardinal-opt(+0x104e5)
*** ERROR ***
---[ Error ]--------------------------------------------------------------------
File : /data2/tranm/software/cardinal/contrib/nekRS/3rd_party/occa/src/occa/internal/modes/cuda/device.cpp
Line : 31
Function : device
Message : Device: Creating Device
CUDA Error [ 101 ]: CUDA_ERROR_INVALID_DEVICE
Stack
27 occa::error(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
26 occa::cuda::error(cudaError_enum, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
25 occa::cuda::device::device(occa::json const&)
24 occa::cuda::cudaMode::newDevice(occa::json const&)
23 occa::device::setup(occa::json const&)
22 occa::device::setup(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
21 device_t::device_t(setupAide&, comm_t&)
20 platform_t::platform_t(setupAide&, ompi_communicator_t*, ompi_communicator_t*)
19 nekrs::setup(ompi_communicator_t*, ompi_communicator_t*, int, int, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int, int, int)
18 NekInitAction::act()
17 Action::timedAct()
16 ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
15 ActionWarehouse::executeAllActions()
14 MooseApp::runInputFile()
13 MultiApp::createApp(unsigned int, double)
12 MultiApp::createLocalApp(unsigned int)
11 MultiApp::createApps()
10 FEProblemBase::addMultiApp(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, InputParameters&)
9 Action::timedAct()
8 ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
7 ActionWarehouse::executeAllActions()
6 MooseApp::runInputFile()
5 MooseApp::run()
4 /home/tranm/software/cardinal/cardinal-opt(main+0x551)
3 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)
2 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)
1 /home/tranm/software/cardinal/cardinal-opt(+0x104e5)
*** ERROR ***
---[ Error ]--------------------------------------------------------------------
File : /data2/tranm/software/cardinal/contrib/nekRS/3rd_party/occa/src/occa/internal/modes/cuda/device.cpp
Line : 31
Function : device
Message : Device: Creating Device
CUDA Error [ 101 ]: CUDA_ERROR_INVALID_DEVICE
Stack
27 occa::error(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
26 occa::cuda::error(cudaError_enum, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
25 occa::cuda::device::device(occa::json const&)
24 occa::cuda::cudaMode::newDevice(occa::json const&)
23 occa::device::setup(occa::json const&)
22 occa::device::setup(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
21 device_t::device_t(setupAide&, comm_t&)
20 platform_t::platform_t(setupAide&, ompi_communicator_t*, ompi_communicator_t*)
19 nekrs::setup(ompi_communicator_t*, ompi_communicator_t*, int, int, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int, int, int)
18 NekInitAction::act()
17 Action::timedAct()
16 ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
15 ActionWarehouse::executeAllActions()
14 MooseApp::runInputFile()
13 MultiApp::createApp(unsigned int, double)
12 MultiApp::createLocalApp(unsigned int)
11 MultiApp::createApps()
10 FEProblemBase::addMultiApp(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, InputParameters&)
9 Action::timedAct()
8 ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
7 ActionWarehouse::executeAllActions()
6 MooseApp::runInputFile()
5 MooseApp::run()
4 /home/tranm/software/cardinal/cardinal-opt(main+0x551)
3 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)
2 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)
1 /home/tranm/software/cardinal/cardinal-opt(+0x104e5)
*** ERROR ***
---[ Error ]--------------------------------------------------------------------
File : /data2/tranm/software/cardinal/contrib/nekRS/3rd_party/occa/src/occa/internal/modes/cuda/device.cpp
Line : 31
Function : device
Message : Device: Creating Device
CUDA Error [ 101 ]: CUDA_ERROR_INVALID_DEVICE
Stack
27 occa::error(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
26 occa::cuda::error(cudaError_enum, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
25 occa::cuda::device::device(occa::json const&)
24 occa::cuda::cudaMode::newDevice(occa::json const&)
23 occa::device::setup(occa::json const&)
22 occa::device::setup(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
21 device_t::device_t(setupAide&, comm_t&)
20 platform_t::platform_t(setupAide&, ompi_communicator_t*, ompi_communicator_t*)
19 nekrs::setup(ompi_communicator_t*, ompi_communicator_t*, int, int, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int, int, int)
18 NekInitAction::act()
17 Action::timedAct()
16 ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
15 ActionWarehouse::executeAllActions()
14 MooseApp::runInputFile()
13 MultiApp::createApp(unsigned int, double)
12 MultiApp::createLocalApp(unsigned int)
11 MultiApp::createApps()
10 FEProblemBase::addMultiApp(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, InputParameters&)
9 Action::timedAct()
8 ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
7 ActionWarehouse::executeAllActions()
6 MooseApp::runInputFile()
5 MooseApp::run()
4 /home/tranm/software/cardinal/cardinal-opt(main+0x551)
3 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)
2 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)
1 /home/tranm/software/cardinal/cardinal-opt(+0x104e5)
*** ERROR ***
---[ Error ]--------------------------------------------------------------------
File : /data2/tranm/software/cardinal/contrib/nekRS/3rd_party/occa/src/occa/internal/modes/cuda/device.cpp
Line : 31
Function : device
Message : Device: Creating Device
CUDA Error [ 101 ]: CUDA_ERROR_INVALID_DEVICE
Stack
27 occa::error(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
26 occa::cuda::error(cudaError_enum, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
25 occa::cuda::device::device(occa::json const&)
24 occa::cuda::cudaMode::newDevice(occa::json const&)
23 occa::device::setup(occa::json const&)
22 occa::device::setup(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
21 device_t::device_t(setupAide&, comm_t&)
20 platform_t::platform_t(setupAide&, ompi_communicator_t*, ompi_communicator_t*)
19 nekrs::setup(ompi_communicator_t*, ompi_communicator_t*, int, int, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int, int, int)
18 NekInitAction::act()
17 Action::timedAct()
16 ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
15 ActionWarehouse::executeAllActions()
14 MooseApp::runInputFile()
13 MultiApp::createApp(unsigned int, double)
12 MultiApp::createLocalApp(unsigned int)
11 MultiApp::createApps()
10 FEProblemBase::addMultiApp(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, InputParameters&)
9 Action::timedAct()
8 ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
7 ActionWarehouse::executeAllActions()
6 MooseApp::runInputFile()
5 MooseApp::run()
4 /home/tranm/software/cardinal/cardinal-opt(main+0x551)
3 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)
2 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)
1 /home/tranm/software/cardinal/cardinal-opt(+0x104e5)
[moose106:526988] 6 more processes have sent help message help-mpi-api.txt / mpi-abort
[moose106:526988] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Beta Was this translation helpful? Give feedback.
All reactions