Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial ROCm 3.1 support #3571

Closed
wants to merge 2 commits into from
Closed

Partial ROCm 3.1 support #3571

wants to merge 2 commits into from

Conversation

mkuron
Copy link
Member

@mkuron mkuron commented Mar 10, 2020

Make CMake changes. Needs ln -s /opt/rocm/bin/hcc* /opt/rocm/hip/bin/ to work around bug in hipcc linker wrapper. Build then succeeds, but anything that passes GPU pointers between compilation units (most notably, EK and LB) fails and all tests deadlock in the AMD HSA shutdown procedure.

@codecov
Copy link

codecov bot commented Mar 10, 2020

Codecov Report

Merging #3571 into python will increase coverage by <1%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           python   #3571    +/-   ##
=======================================
+ Coverage      88%     88%   +<1%     
=======================================
  Files         524     524            
  Lines       23598   23598            
=======================================
+ Hits        20772   20774     +2     
+ Misses       2826    2824     -2
Impacted Files Coverage Δ
src/core/electrostatics_magnetostatics/p3m.cpp 85% <0%> (-1%) ⬇️
src/core/polymer.cpp 98% <0%> (+5%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 729a677...f8f5643. Read the comment docs.

CMakeLists.txt Outdated
@@ -187,13 +187,13 @@ if(WITH_CUDA)
message(STATUS "Found HIP compiler: ${HIP_HIPCC_EXECUTABLE}")
set(CUDA 1)
set(HIP 1)
list(APPEND HIP_HCC_FLAGS "-I${HIP_ROOT_DIR}/include -Wno-c99-designator -Wno-macro-redefined -Wno-duplicate-decl-specifier -std=c++14")
list(APPEND HIP_HCC_FLAGS "-I${HIP_ROOT_DIR}/include -I${HIP_ROOT_DIR}/../include -Wno-c99-designator -Wno-macro-redefined -Wno-duplicate-decl-specifier -std=c++14")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIP_HCC_FLAGS expands to "-I/opt/rocm/include -I/opt/rocm/../include -Wno-c99..." on ROCm 3.0, but /opt/rocm/../include doesn't exist. Is there a way in CMake to find the path of the parent ROCm folder? This would allow us to write "-I${HIP_ROOT_DIR}/include -I${ROCM_ROOT_DIR}/include -Wno-c99..." and save us some headache when debugging this CMake logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately there is not. So maybe we should merge your patch instead of this pull request that contains an explicit version check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do like in pytorch/pytorch:torch/utils/cpp_extension.py#L53-L73, e.g.

      string(REGEX REPLACE "^(.+)/bin/hipcc$" "\\1" ROCM_HOME ${HIP_HIPCC_EXECUTABLE})
      list(APPEND HIP_HCC_FLAGS "-I${HIP_ROOT_DIR}/include -I${ROCM_HOME}/include -Wno-c99-designator -Wno-macro-redefined -Wno-duplicate-decl-specifier -std=c++14")

But this seems superfluous given that we already hardcode the path to /opt/rocm:

list(APPEND CMAKE_MODULE_PATH "/opt/rocm/hip/cmake")

Why not directly do the following?

set(ROCM_HOME "/opt/rocm")
list(APPEND CMAKE_MODULE_PATH "${ROCM_HOME}/hip/cmake") 
# ...
list(APPEND HIP_HCC_FLAGS "-I${HIP_ROOT_DIR}/include -I${ROCM_HOME}/include -Wno-c99-designator -Wno-macro-redefined -Wno-duplicate-decl-specifier -std=c++14")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROCM_HOME sounds like a sensible solution.

@KaiSzuttor
Copy link
Member

I think in general we should try to keep these details out of the main CMakeLists.txt. Otherwise this file will be unreadable at some point. So whatever the solution is, please put it in a separate cmake module in the cmake directory of the project root.

@mkuron
Copy link
Member Author

mkuron commented Mar 10, 2020

@KaiSzuttor, it makes no sense to move this stuff elsewhere because the usual place to do all the library detection is in the main CMakeLists.txt. CUDA detection takes up just as much space.

@KaiSzuttor
Copy link
Member

KaiSzuttor commented Mar 10, 2020

it makes no sense to move this stuff elsewhere because the usual place to do all the library detection is in the main CMakeLists.txt. CUDA detection takes up just as much space.

Can you please explain this statement? Just because there is mess elsewhere in the file it does not mean that we should not clean up and extend the mess at another place.

@jngrad
Copy link
Member

jngrad commented Mar 10, 2020

So whatever the solution is, please put it in a separate cmake module in the cmake directory of the project root.

I'm currently trying to refactor the CMake logic for detecting CUDA. We are using the FindCUDA code that has been deprecated in CMake 3.10 and doesn't fully support CUDA 10.0+. For example, the cublas error that has been blocking the Stokesian Dynamics PR for weeks is resolved by simply using the CMake native CUDA support. As part of the refactor, I thought about moving some of the logic to dedicated *.cmake files in the espresso /cmake folder, e.g. the definitions of add_gpu_library and some library path detection and variable creation logic. However we'll probably have to keep a minimal if/else structure in the main CMakeLists.txt file for the 3 CUDA compilers we detect.

@KaiSzuttor
Copy link
Member

it makes no sense to move this stuff elsewhere because the usual place to do all the library detection is in the main CMakeLists.txt

no, it happens in the Find*.cmake files

@mkuron
Copy link
Member Author

mkuron commented Mar 10, 2020

no, it happens in the Find*.cmake files

Setting compiler flags, which is what we are currently discussing, shouldn't happen inside FindHIP.cmake. The choice of compiler flags is Espresso-specific. Find*.cmake files are supposed to be provided either by CMake or by the respective library to deal with things common to all use cases.

@KaiSzuttor
Copy link
Member

Compiler flags should be set on targets not globally, so they neither belong into the main cmake file nor into FindHIP.cmake

@KaiSzuttor
Copy link
Member

An introduction to modern cmake:

You often want a cmake folder, with all of your helper modules. This is where your Find*.cmake files go.

@jngrad
Copy link
Member

jngrad commented Mar 10, 2020

Please don't invest any more time on this PR, I'm including it in my CMake refactoring PR. It's taking more time than anticipated because the FindCython.cmake is also broken.

@KaiSzuttor KaiSzuttor closed this Mar 12, 2020
kodiakhq bot added a commit that referenced this pull request Mar 12, 2020
Description of changes:
- move logic to import packages from `CMakeLists.txt` to dedicated helper files `cmake/Find<package>.cmake` for `find_package()`
- enforce the Cython version requested in `CMakeLists.txt`
- CMake now fails if `WITH_CUDA` is set to true but no CUDA-capable compiler is found
- CMake now fails if `WITH_CLANG_TIDY` is set to true but Clang-tidy is not found or its version doesn't match the Clang compiler version
- drop deprecated `FindCUDA` in favor of native CUDA support in CMake 3.10 (required for  #3445)
- add partial support for ROCm 3.1 (closes #3571, required for espressomd/docker#156)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants