Partial ROCm 3.1 support #3571

mkuron · 2020-03-10T11:21:31Z

Make CMake changes. Needs ln -s /opt/rocm/bin/hcc* /opt/rocm/hip/bin/ to work around bug in hipcc linker wrapper. Build then succeeds, but anything that passes GPU pointers between compilation units (most notably, EK and LB) fails and all tests deadlock in the AMD HSA shutdown procedure.

codecov · 2020-03-10T11:44:25Z

Codecov Report

Merging #3571 into python will increase coverage by <1%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           python   #3571    +/-   ##
=======================================
+ Coverage      88%     88%   +<1%     
=======================================
  Files         524     524            
  Lines       23598   23598            
=======================================
+ Hits        20772   20774     +2     
+ Misses       2826    2824     -2

Impacted Files	Coverage Δ
src/core/electrostatics_magnetostatics/p3m.cpp	`85% <0%> (-1%)`	⬇️
src/core/polymer.cpp	`98% <0%> (+5%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 729a677...f8f5643. Read the comment docs.

jngrad · 2020-03-10T12:13:26Z

CMakeLists.txt

@@ -187,13 +187,13 @@ if(WITH_CUDA)
      message(STATUS "Found HIP compiler: ${HIP_HIPCC_EXECUTABLE}")
      set(CUDA 1)
      set(HIP 1)
-      list(APPEND HIP_HCC_FLAGS "-I${HIP_ROOT_DIR}/include -Wno-c99-designator -Wno-macro-redefined -Wno-duplicate-decl-specifier -std=c++14")
+      list(APPEND HIP_HCC_FLAGS "-I${HIP_ROOT_DIR}/include -I${HIP_ROOT_DIR}/../include -Wno-c99-designator -Wno-macro-redefined -Wno-duplicate-decl-specifier -std=c++14")


HIP_HCC_FLAGS expands to "-I/opt/rocm/include -I/opt/rocm/../include -Wno-c99..." on ROCm 3.0, but /opt/rocm/../include doesn't exist. Is there a way in CMake to find the path of the parent ROCm folder? This would allow us to write "-I${HIP_ROOT_DIR}/include -I${ROCM_ROOT_DIR}/include -Wno-c99..." and save us some headache when debugging this CMake logic.

Unfortunately there is not. So maybe we should merge your patch instead of this pull request that contains an explicit version check.

We could do like in pytorch/pytorch:torch/utils/cpp_extension.py#L53-L73, e.g.

string(REGEX REPLACE "^(.+)/bin/hipcc$" "\\1" ROCM_HOME ${HIP_HIPCC_EXECUTABLE}) list(APPEND HIP_HCC_FLAGS "-I${HIP_ROOT_DIR}/include -I${ROCM_HOME}/include -Wno-c99-designator -Wno-macro-redefined -Wno-duplicate-decl-specifier -std=c++14")

But this seems superfluous given that we already hardcode the path to /opt/rocm:

espresso/CMakeLists.txt

Line 182 in a4596e3

list(APPEND CMAKE_MODULE_PATH "/opt/rocm/hip/cmake")

Why not directly do the following?

set(ROCM_HOME "/opt/rocm") list(APPEND CMAKE_MODULE_PATH "${ROCM_HOME}/hip/cmake") # ... list(APPEND HIP_HCC_FLAGS "-I${HIP_ROOT_DIR}/include -I${ROCM_HOME}/include -Wno-c99-designator -Wno-macro-redefined -Wno-duplicate-decl-specifier -std=c++14")

ROCM_HOME sounds like a sensible solution.

KaiSzuttor · 2020-03-10T13:20:07Z

I think in general we should try to keep these details out of the main CMakeLists.txt. Otherwise this file will be unreadable at some point. So whatever the solution is, please put it in a separate cmake module in the cmake directory of the project root.

mkuron · 2020-03-10T13:33:14Z

@KaiSzuttor, it makes no sense to move this stuff elsewhere because the usual place to do all the library detection is in the main CMakeLists.txt. CUDA detection takes up just as much space.

KaiSzuttor · 2020-03-10T13:44:06Z

it makes no sense to move this stuff elsewhere because the usual place to do all the library detection is in the main CMakeLists.txt. CUDA detection takes up just as much space.

Can you please explain this statement? Just because there is mess elsewhere in the file it does not mean that we should not clean up and extend the mess at another place.

jngrad · 2020-03-10T13:48:04Z

So whatever the solution is, please put it in a separate cmake module in the cmake directory of the project root.

I'm currently trying to refactor the CMake logic for detecting CUDA. We are using the FindCUDA code that has been deprecated in CMake 3.10 and doesn't fully support CUDA 10.0+. For example, the cublas error that has been blocking the Stokesian Dynamics PR for weeks is resolved by simply using the CMake native CUDA support. As part of the refactor, I thought about moving some of the logic to dedicated *.cmake files in the espresso /cmake folder, e.g. the definitions of add_gpu_library and some library path detection and variable creation logic. However we'll probably have to keep a minimal if/else structure in the main CMakeLists.txt file for the 3 CUDA compilers we detect.

KaiSzuttor · 2020-03-10T13:48:47Z

it makes no sense to move this stuff elsewhere because the usual place to do all the library detection is in the main CMakeLists.txt

no, it happens in the Find*.cmake files

mkuron · 2020-03-10T13:51:27Z

no, it happens in the Find*.cmake files

Setting compiler flags, which is what we are currently discussing, shouldn't happen inside FindHIP.cmake. The choice of compiler flags is Espresso-specific. Find*.cmake files are supposed to be provided either by CMake or by the respective library to deal with things common to all use cases.

KaiSzuttor · 2020-03-10T13:52:50Z

Compiler flags should be set on targets not globally, so they neither belong into the main cmake file nor into FindHIP.cmake

KaiSzuttor · 2020-03-10T13:57:30Z

An introduction to modern cmake:

You often want a cmake folder, with all of your helper modules. This is where your Find*.cmake files go.

jngrad · 2020-03-10T15:46:01Z

Please don't invest any more time on this PR, I'm including it in my CMake refactoring PR. It's taking more time than anticipated because the FindCython.cmake is also broken.

Description of changes: - move logic to import packages from `CMakeLists.txt` to dedicated helper files `cmake/Find<package>.cmake` for `find_package()` - enforce the Cython version requested in `CMakeLists.txt` - CMake now fails if `WITH_CUDA` is set to true but no CUDA-capable compiler is found - CMake now fails if `WITH_CLANG_TIDY` is set to true but Clang-tidy is not found or its version doesn't match the Clang compiler version - drop deprecated `FindCUDA` in favor of native CUDA support in CMake 3.10 (required for #3445) - add partial support for ROCm 3.1 (closes #3571, required for espressomd/docker#156)

Adapt to ROCm 3.1 path changes

105a41d

jngrad reviewed Mar 10, 2020

View reviewed changes

Introduce ROCM_HOME CMake variable

f8f5643

jngrad mentioned this pull request Mar 10, 2020

Refactor CMake package inclusion #3574

Merged

KaiSzuttor closed this Mar 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial ROCm 3.1 support #3571

Partial ROCm 3.1 support #3571

mkuron commented Mar 10, 2020 •

edited

Loading

codecov bot commented Mar 10, 2020 •

edited

Loading

jngrad Mar 10, 2020

mkuron Mar 10, 2020

jngrad Mar 10, 2020

mkuron Mar 10, 2020

KaiSzuttor commented Mar 10, 2020

mkuron commented Mar 10, 2020

KaiSzuttor commented Mar 10, 2020 •

edited

Loading

jngrad commented Mar 10, 2020

KaiSzuttor commented Mar 10, 2020

mkuron commented Mar 10, 2020

KaiSzuttor commented Mar 10, 2020

KaiSzuttor commented Mar 10, 2020

jngrad commented Mar 10, 2020

Partial ROCm 3.1 support #3571

Partial ROCm 3.1 support #3571

Conversation

mkuron commented Mar 10, 2020 • edited Loading

codecov bot commented Mar 10, 2020 • edited Loading

Codecov Report

jngrad Mar 10, 2020

Choose a reason for hiding this comment

mkuron Mar 10, 2020

Choose a reason for hiding this comment

jngrad Mar 10, 2020

Choose a reason for hiding this comment

mkuron Mar 10, 2020

Choose a reason for hiding this comment

KaiSzuttor commented Mar 10, 2020

mkuron commented Mar 10, 2020

KaiSzuttor commented Mar 10, 2020 • edited Loading

jngrad commented Mar 10, 2020

KaiSzuttor commented Mar 10, 2020

mkuron commented Mar 10, 2020

KaiSzuttor commented Mar 10, 2020

KaiSzuttor commented Mar 10, 2020

jngrad commented Mar 10, 2020

mkuron commented Mar 10, 2020 •

edited

Loading

codecov bot commented Mar 10, 2020 •

edited

Loading

KaiSzuttor commented Mar 10, 2020 •

edited

Loading