-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor CMake logic for the CUDA compiler #3582
Conversation
Codecov Report
@@ Coverage Diff @@
## python #3582 +/- ##
======================================
Coverage 87% 87%
======================================
Files 533 533
Lines 22959 22959
======================================
Hits 20159 20159
Misses 2800 2800 Continue to review full report at Codecov.
|
cmake/FindCUDACompilerNVCC.cmake
Outdated
|
||
set(CUDA_NVCC_FLAGS_DEBUG "${CUDA_NVCC_FLAGS_DEBUG} -g") | ||
set(CUDA_NVCC_FLAGS_RELEASE "${CUDA_NVCC_FLAGS_RELEASE} -O3 -DNDEBUG") | ||
set(CUDA_NVCC_FLAGS_MINSIZEREL "${CUDA_NVCC_FLAGS_MINSIZEREL} -Os -DNDEBUG") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
several issues here:
-Os
is not a valid optimization level:nvcc fatal : 's': expected a number
- there seems to be a difference between
-O2
,-Xptxas -O2
,-Xcompiler -O2
(StackOverflow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
progress:
- CUDA compiler flags are actually quite complicated. Also, device and host compiler optimization flags don't mix well.
- We don't need to touch the host and device flags in
build_cmake.sh
. The logic is wrong anyway, and it is partially overridden by CMake in Debug builds.
I'm currently removing a lot of useless code in build_cmake.sh
, but getting the CMake logic right will take me a few more days, I'm afraid. After that, we'll be able to experiment with CMake pseudo-interfaces for GPU targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasted two man hours investigating a Clang compiler error, because when we do
espresso/src/core/CMakeLists.txt
Line 147 in d67ab4a
add_gpu_library(EspressoCore SHARED ${EspressoCore_SRC} ${EspressoCuda_SRC}) |
we forward CUDA compiler flags to regular .cpp files:
Line 165 in d67ab4a
set_source_files_properties(${ARG_UNPARSED_ARGUMENTS} PROPERTIES LANGUAGE "CXX" COMPILE_FLAGS "${CUDA_NVCC_FLAGS}") |
even though a few lines later there's a mechanism to set those flags to .cu files only:
Line 172 in d67ab4a
if(${file} MATCHES "\\.cu$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasted one man hour because neither CMakeLists.txt
nor build_cmake.sh
throw an error when using the incompatible combination of CMake flags -DCUDA_NVCC_EXECUTABLE=$(which clang++) -DWITH_COVERAGE=ON
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My last PR introduced a regression: the boost version depends on the CUDA version, so CUDA must be loaded before boost...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we override C++ flags only for release builds and not all other types of build???
espresso/src/core/CMakeLists.txt
Lines 78 to 79 in f291cf4
"$<$<AND:$<BOOL:${WITH_COVERAGE}>,$<CONFIG:Release>>:-g>" | |
"$<$<AND:$<BOOL:${WITH_COVERAGE}>,$<CONFIG:Release>>:-O0>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why -01
specifically? And why only for release builds?
Lines 347 to 348 in f291cf4
if(WITH_ASAN) | |
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -g -O1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FindCUDA didn‘t really have a concept of debug vs. release, so we manually set these flags to match the host compiler and get debuggable or optimized CUDA code as requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FindCUDA didn‘t really have a concept of debug vs. release
doesn't it since 3.0? CUDA_NVCC_FLAGS_<CONFIG>
HIP also does: HIP_<COMPILER>_FLAGS_<CONFIG>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that didn't set the flags for both host and device, but that may no longer be true. The -g -O1
for WITH_ASAN
is a compromise between good sanitizability and reasonable performance. In Debug mode, you have -O0 -g
anyway.
64e9a4c
to
2c93112
Compare
Fixes a regression from 1e3fc3b: the Boost version depends on the CUDA version for Intel, therefore CUDA must be loaded first, but the CUDA version depends on the C++ version, so C++14 must be defined at the top of the file.
The user can now provide the name of the CUDA compiler as a CMake flag instead of modifying CMake variables to bypass the guessing mechanism. CMake fails if the selected compiler is not found. The FindCUDACompiler helper file was split into separate .cmake files.
The minimum CUDA version is 9.0 for NVCC and 8.0 for Clang. For the Clang compiler and the Clang-based HIP compiler, the minimum version that is tested in CI is 6.0.0.
The ESPResSo CMake project now has proper dependencies for every target in the source code, documentation and tests. It is no longer necessary to delete the build directory to start a fresh build.
Explicitly toggle GPU code on/off with WITH_CUDA and select the compiler with WITH_CUDA_COMPILER.
With the CUDA compiler guessing mechanism removed, the user must explicitly states if CUDA is available and which compiler to use.
In ROCm 3.0 and 3.1, environment variables for hipcc and hcc are overriden by incorrect paths (espressomd/docker#156). This causes CMake to generate an incorrect linking command for EspressoCore.so: in `/opt/rocm/bin/hipcc_cmake_linker_helper /opt/rocm -fPIC ...`, either path `/opt/rocm` is an empty string, or both the linker path and path `/opt/rocm` are empty strings. Calling `find_package()` twice with an overriden `HCC_PATH` fixes the linking command.
Knowing the CMake version is extremely useful when reviewing CMake logs attached in bug reports.
The C++ standard used for CUDA code is set in `CMAKE_CUDA_STANDARD`. Variable `CMAKE_CUDA_VERSION` was renamed to `MINIMAL_CUDA_VERSION` for clarity.
The nvcc `-O<N>` optimization flag can only take a number.
Cannot compile CUDA code with coverage enabled using Clang.
66fe6cf
to
69d4962
Compare
Rewrite CUDA flags based on the CXX flags: CMAKE_CXX_FLAGS_DEBUG = -O0 -g CMAKE_CXX_FLAGS_RELEASE = -O3 -DNDEBUG CMAKE_CXX_FLAGS_MINSIZEREL = -Os -DNDEBUG CMAKE_CXX_FLAGS_RELWITHDEBINFO = -O2 -g -DNDEBUG Add a COVERAGE build type that uses -O0 for host and -O3 for device. This replaces the logic in the CI script that had to touch `CMAKE_CXX_FLAGS` and `CUDA_NVCC_FLAGS`. The -O0 optimization flag for host avoids ending up with source code lines in the gcov output with neither hit or miss. According to `man gcov`: > compile your code without optimization if you plan to use gcov > because the optimization, by combining some lines of code into > one function, may not give you as much information as you need
Generate a warning for incorrect build types and override them with 'Release'. Move the `set(CMAKE_BUILD_TYPE CACHE)` declaration out of the conditional such that its help message always gets displayed in cmake/ccmake. List the possible values as properties to allow cycling in ccmake and cmake-gui (instead of manually typing them). Same thing for WITH_CUDA_COMPILER. This is achieved by creating a wrapper around the `option()` function that accepts an enum value (stored as a string) instead of a boolean value.
The same guard is used in the CMake logic for Python tests.
69d4962
to
ff2f98f
Compare
Splitting the add_library(${GPU_TARGET_NAME}_cxx ${${GPU_TARGET_NAME}_sources_cxx})
add_library(${GPU_TARGET_NAME}_cu ${${GPU_TARGET_NAME}_sources_cu})
target_link_libraries(${GPU_TARGET_NAME}_cu PUBLIC gpu_interface)
add_library(${GPU_TARGET_NAME})
set_target_properties(${GPU_TARGET_NAME} PROPERTIES LINKER_LANGUAGE "CXX")
target_link_libraries(${GPU_TARGET_NAME} PUBLIC ${GPU_TARGET_NAME}_cxx ${GPU_TARGET_NAME}_cu)
target_link_libraries(${GPU_TARGET_NAME} PRIVATE ${CUDA_LIBRARY} ${CUDART_LIBRARY})
|
Oh right, target_link_libraries(EspressoCore_cu PRIVATE EspressoConfig shapes)
target_include_directories(EspressoCore_cu PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
target_link_libraries(EspressoCore PRIVATE EspressoCore_cu) This actually compiles on Clang and passes the Python tests. |
Let me amend the previous statement: we can't separate CUDA from C++ sources files using two targets. If
The non-portable I love CMake. @KaiSzuttor Do you have any suggestion? I can't think of another way to introduce a GPU |
@KaiSzuttor Found the solution: installing thrust 1.9.5 to fix the |
8420864
to
b6a485c
Compare
CMakeLists.txt
Outdated
endif() | ||
|
||
add_library(cxx_interface INTERFACE) | ||
target_compile_options(cxx_interface INTERFACE ${cxx_interface_flags}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of currriosity, why do you keep the list in cxx_interface_flags
instead of directly using target_compile_options
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually need to pass these flags to both the compiler and linker, otherwise we get unresolved symbols (e.g. ASAN in clang:6.0 logfile).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the line below, I pass the same list to target_link_libraries()
, which defines linker flags. CMake 3.13 introduced target_link_options()
to pass only linker flags, in an effort to make the intent clearer thanks to the name similarity with target_compile_options()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the -fsanitize=
no? (c.f. https://gitlab.icp.uni-stuttgart.de/fweik/mdx/-/blob/master/CMakeLists.txt) Otherwise passing compiler options to the linker doesn't do anything, and is just confusing. Also you still wouldn't need the list, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hint! I wasn't sure if -fsanitize=...
alone was sufficient, but it actually was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, some of the sanitizers have a runtime component, which needs to be linked...
b6a485c
to
9912ad5
Compare
The // lb_inertialess_tracers_cuda_interface.hpp
extern IBM_CUDA_ParticleDataInput *IBM_ParticleDataInput_host;
// lb_inertialess_tracers_cuda_interface.cpp
IBM_CUDA_ParticleDataInput *IBM_ParticleDataInput_host = nullptr;
// lb_inertialess_tracers_cuda.cu
if (IBM_ParticleDataInput_host != NULL) { Did I introduce a design flaw by creating a shared object The |
@jngrad IIRC there are cyclic dependencies between the CUDA and the core code... |
@KaiSzuttor The PR is ready from my side. Let me now if you have additional changes to suggest. Otherwise, I'll rebase commits 58bdbc8 to 7025e65 to make the git history less chaotic, and resolve the merge conflict. The slowdown issues mentioned 2 weeks ago have been resolved and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me besides the comments. Thanks @jngrad that looks much cleaner now.
Use interface libraries for compiler flags instead of populating variable with global scope that are injected in the compilation and linking commands of all libraries (e.g. CMAKE_CXX_FLAGS). Remove duplicated compile flags. Give meaningful names to variables in FindCUDA* CMake files. Document CMake policies and add CMP0025 to distinguish between Clang and AppleClang. Replace simple if/else blocks by generator expressions.
Move ROCm path patching logic in FindCUDACompilerHIP.cmake and check only the HIP version.
The double quotes were not removed by the shell interpreter.
7025e65
to
732b47d
Compare
732b47d
to
b53e4c9
Compare
Clang returns "version unknown" for unsupported CUDA libraries, or doesn't return a version string (depending on CMAKE_CXX_FLAGS), causing the CMake regex to store the complete Clang stdout in the CUDA_VERSION variable instead of a valid version number. This is now fixed, and the CUDA version is now shown as <major>.<minor>.
Description of changes:
WITH_CUDA
option is now opt-inWITH_CUDA_COMPILER
(any of "nvcc", "clang", "hip")CMakeLists.txt
build_cmake.sh
was replaced with dedicated build types (Coverage
andRelWithAssert
)