-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot compile with GCC 10 or later for the CUDA backend if the kernel functor is templated #1472
Comments
The HEAD of our |
After going through plenty of intermediate code generated by nvcc and preprocessed code generated by g++, I now suspect that the issue may be related to template arguments with the Looking at the code snippets above, the kernel has the signature template <typename T_Acc, typename Histo, typename T> ALPAKA_FN_ACC void struct countFromVector::operator()(
const T_Acc &acc,
Histo *__restrict__ h,
uint32_t nh,
T const *__restrict__ v,
uint32_t const *__restrict__ offsets) const; but the compiler is trying to cast the function that calls it to a function pointer of type (here void (*)(class alpaka::Vec<std::integral_constant<long unsigned int, 1>, unsigned int>,
struct cms::alpakatools::countFromVector,
class cms::alpakatools::HistoContainer<short int, 128, 49152, 16, short unsigned int, 10>*,
unsigned int,
const short int*,
const unsigned int*)’ without the |
I found a much simpler way to reproduce the problem, modifying the diff --git a/example/vectorAdd/src/vectorAdd.cpp b/example/vectorAdd/src/vectorAdd.cpp
index 3b5de083a72d..07dcc547f66a 100644
--- a/example/vectorAdd/src/vectorAdd.cpp
+++ b/example/vectorAdd/src/vectorAdd.cpp
@@ -163,12 +163,15 @@ auto main() -> int
VectorAddKernel kernel;
// Create the kernel execution task.
+ Data const* __restrict__ const pBufAccA(alpaka::getPtrNative(bufAccA));
+ Data const* __restrict__ const pBufAccB(alpaka::getPtrNative(bufAccB));
+ Data* __restrict__ const pBufAccC(alpaka::getPtrNative(bufAccC));
auto const taskKernel = alpaka::createTaskKernel<Acc>(
workDiv,
kernel,
- alpaka::getPtrNative(bufAccA),
- alpaka::getPtrNative(bufAccB),
- alpaka::getPtrNative(bufAccC),
+ pBufAccA,
+ pBufAccB,
+ pBufAccC,
numElements);
// Enqueue the kernel execution task The modification compile fine with GCC 9: $ g++ --version
g++ (GCC) 9.2.1 20191120 (Red Hat 9.2.1-2)
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ nvcc -std=c++17 -O3 -x cu -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored --expt-relaxed-constexpr --expt-extended-lambda --generate-line-info --source-in-ptx -Iinclude -DALPAKA_ACC_GPU_CUDA_ENABLED -DALPAKA_ACC_GPU_CUDA_ONLY_MODE -DALPAKA_DEBUG=ALPAKA_DEBUG_FULL example/vectorAdd/src/vectorAdd.cpp -o vectorAdd but fail with GCC 10: $ g++ --version
g++ (GCC) 10.2.1 20201112 (Red Hat 10.2.1-8)
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ nvcc -std=c++17 -O3 -x cu -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored --expt-relaxed-constexpr --expt-extended-lambda --generate-line-info --source-in-ptx -Iinclude -DALPAKA_ACC_GPU_CUDA_ENABLED -DALPAKA_ACC_GPU_CUDA_ONLY_MODE -DALPAKA_DEBUG=ALPAKA_DEBUG_FULL example/vectorAdd/src/vectorAdd.cpp -o vectorAdd
include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp: In instantiation of ‘static void alpaka::traits::Enqueue<alpaka::QueueUniformCudaHipRtBlocking, alpaka::TaskKernelGpuUniformCudaHipRt<TAcc, TDim, TIdx, TKernelFnObj, TArgs ...>, void>::enqueue(alpaka::QueueUniformCudaHipRtBlocking&, const alpaka::TaskKernelGpuUniformCudaHipRt<TAcc, TDim, TIdx, TKernelFnObj, TArgs ...>&) [with TAcc = alpaka::AccGpuCudaRt<std::integral_constant<long unsigned int, 1>, long unsigned int>; TDim = std::integral_constant<long unsigned int, 1>; TIdx = long unsigned int; TKernelFnObj = VectorAddKernel; TArgs = {const unsigned int* const __restrict__&, const unsigned int* const __restrict__&, unsigned int* const __restrict__&, const long unsigned int&}]’:
include/alpaka/queue/Traits.hpp:48:58: required from ‘void alpaka::enqueue(TQueue&, TTask&&) [with TQueue = alpaka::QueueUniformCudaHipRtBlocking; TTask = const alpaka::TaskKernelGpuUniformCudaHipRt<alpaka::AccGpuCudaRt<std::integral_constant<long unsigned int, 1>, long unsigned int>, std::integral_constant<long unsigned int, 1>, long unsigned int, VectorAddKernel, const unsigned int* const __restrict__&, const unsigned int* const __restrict__&, unsigned int* const __restrict__&, const long unsigned int&>&]’
example/vectorAdd/src/vectorAdd.cpp:180:34: required from here
include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp:374:6: error: no matches converting function ‘uniformCudaHipKernel’ to type ‘void (*)(class alpaka::Vec<std::integral_constant<long unsigned int, 1>, long unsigned int>, struct VectorAddKernel, const unsigned int*, const unsigned int*, unsigned int*, long unsigned int)’
374 | auto kernelName = uniform_cuda_hip::detail::
| ^
include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp:74:1: note: candidate is: ‘template<class TAcc, class TDim, class TIdx, class TKernelFnObj, class ... TArgs> void alpaka::uniform_cuda_hip::detail::uniformCudaHipKernel(alpaka::Vec<TDim, TVal>, TKernelFnObj, TArgs ...)’
74 | __global__ void uniformCudaHipKernel(
| ^ ~~~~~~~~ |
Thanks for reporting this. Which CUDA version are you using here? |
@fwyzard I can reproduce the error with your mini example but currently do not understand why it is failing. |
If I change the alpaka code to be explicit and except only the signature for the vector add all is working. diff --git a/example/vectorAdd/src/vectorAdd.cpp b/example/vectorAdd/src/vectorAdd.cpp
index 3b5de083a7..7434805164 100644
--- a/example/vectorAdd/src/vectorAdd.cpp
+++ b/example/vectorAdd/src/vectorAdd.cpp
@@ -163,13 +163,10 @@ auto main() -> int
VectorAddKernel kernel;
// Create the kernel execution task.
- auto const taskKernel = alpaka::createTaskKernel<Acc>(
- workDiv,
- kernel,
- alpaka::getPtrNative(bufAccA),
- alpaka::getPtrNative(bufAccB),
- alpaka::getPtrNative(bufAccC),
- numElements);
+ Data const * const pBufAccA(alpaka::getPtrNative(bufAccA));
+ Data const* const pBufAccB(alpaka::getPtrNative(bufAccB));
+ Data* __restrict__ const pBufAccC(alpaka::getPtrNative(bufAccC));
+ auto const taskKernel = alpaka::createTaskKernel<Acc>(workDiv, kernel, pBufAccA, pBufAccB, pBufAccC, numElements);
// Enqueue the kernel execution task
{
diff --git a/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp b/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp
index 6af0b62bfe..42632db89f 100644
--- a/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp
+++ b/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp
@@ -74,7 +74,11 @@ namespace alpaka
__global__ void uniformCudaHipKernel(
Vec<TDim, TIdx> const threadElemExtent,
TKernelFnObj const kernelFnObj,
+#if 0
TArgs... args)
+#else
+ uint32_t const * const a,uint32_t const * const b,uint32_t* const c,size_t s)
+#endif
{
# if BOOST_ARCH_PTX && (BOOST_ARCH_PTX < BOOST_VERSION_NUMBER(2, 0, 0))
# error "Device capability >= 2.0 is required!"
@@ -83,12 +87,11 @@ namespace alpaka
const TAcc acc(threadElemExtent);
// with clang it is not possible to query std::result_of for a pure device lambda created on the host side
-# if !(BOOST_COMP_CLANG_CUDA && BOOST_COMP_CLANG)
- static_assert(
- std::is_same<decltype(kernelFnObj(const_cast<TAcc const&>(acc), args...)), void>::value,
- "The TKernelFnObj is required to return void!");
-# endif
+#if 0
kernelFnObj(const_cast<TAcc const&>(acc), args...);
+#else
+ kernelFnObj(const_cast<TAcc const&>(acc), a,b,c,s);
+#endif
}
template<typename TDim, typename TIdx> IMO the error is not because |
@fwyzard Is there a benefit to declare the pointer |
I wrote my own diff --git a/example/vectorAdd/src/vectorAdd.cpp b/example/vectorAdd/src/vectorAdd.cpp
index 3b5de083a7..7434805164 100644
--- a/example/vectorAdd/src/vectorAdd.cpp
+++ b/example/vectorAdd/src/vectorAdd.cpp
@@ -163,13 +163,10 @@ auto main() -> int
VectorAddKernel kernel;
// Create the kernel execution task.
- auto const taskKernel = alpaka::createTaskKernel<Acc>(
- workDiv,
- kernel,
- alpaka::getPtrNative(bufAccA),
- alpaka::getPtrNative(bufAccB),
- alpaka::getPtrNative(bufAccC),
- numElements);
+ Data const * const pBufAccA(alpaka::getPtrNative(bufAccA));
+ Data const* const pBufAccB(alpaka::getPtrNative(bufAccB));
+ Data* __restrict__ const pBufAccC(alpaka::getPtrNative(bufAccC));
+ auto const taskKernel = alpaka::createTaskKernel<Acc>(workDiv, kernel, pBufAccA, pBufAccB, pBufAccC, numElements);
// Enqueue the kernel execution task
{
diff --git a/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp b/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp
index 6af0b62bfe..ee98ba31b3 100644
--- a/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp
+++ b/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp
@@ -66,6 +66,18 @@ namespace alpaka
{
namespace uniform_cuda_hip
{
+ template<typename A>
+ struct remove_restrict
+ {
+ using type = A;
+ };
+
+ template<typename A>
+ struct remove_restrict<A __restrict__>
+ {
+ using type = A;
+ };
+
namespace detail
{
//! The GPU CUDA/HIP kernel entry point.
@@ -74,7 +86,7 @@ namespace alpaka
__global__ void uniformCudaHipKernel(
Vec<TDim, TIdx> const threadElemExtent,
TKernelFnObj const kernelFnObj,
- TArgs... args)
+ typename remove_restrict<TArgs>::type... args)
{
# if BOOST_ARCH_PTX && (BOOST_ARCH_PTX < BOOST_VERSION_NUMBER(2, 0, 0))
# error "Device capability >= 2.0 is required!" I will provide an PR. |
fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.
Probably not :-) |
I can confirm the #1474 fixes the compilation problem with GCC 10. |
fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.
fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.
fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.
fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.
fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.
fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.
fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.
fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.
fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.
fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.
fix #1472 Provide a type trait to remove __restrict__ from a type.
Fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.
fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.
fix #1472 Provide a type trait to remove __restrict__ from a type.
Looks like a change in GCC 10 breaks the compilation for the CUDA backend of kernels written with Alpaka, if the
operator()
is templated on its arguments other thanTAcc
.Here is an extract of the code:
When used later on in some application code, e.g.
::cms::alpakatools::fillManyFromVector(hits_d.phiBinner(), 10, hits_d.c_iphi(), hits_d.c_hitsLayerStart(), nHits, 256, queue);
this results in the error
The errors happens with GCC 10.2 and 11.2; the same code compiles fine with GCC 9.
To reproduce:
After having set up the environment and built the dependeicies, a simpler way to reproduce the problem is
nvcc -x cu \ -std=c++17 -O3 \ -DALPAKA_ACC_GPU_CUDA_ENABLED \ -DALPAKA_ACC_GPU_CUDA_ONLY_MODE \ --expt-relaxed-constexpr --expt-extended-lambda \ -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored \ --diag-suppress 20014 \ --cudart=shared \ -gencode arch=compute_75,code=[sm_75,compute_75] \ -ccbin g++-10 \ --compiler-options '-fPIC -fdiagnostics-show-option -felide-constructors -fmessage-length=0 -fno-math-errno -ftree-vectorize -fvisibility-inlines-hidden --param vect-max-version-for-alias-checks=50 -msse3 -pipe -pthread' \ -DGPU_DEBUG \ -Isrc/alpaka \ -DSRC_DIR=src/alpaka \ -DLIB_DIR=lib/alpaka \ -isystem external/alpaka/include \ -isystem external/boost/include \ -isystem external/libbacktrace/include \ -dc \ -c src/alpaka/test/alpaka/HistoContainer_t.cc \ -o HistoContainer_t.cc.cuda.o
The text was updated successfully, but these errors were encountered: