Cannot compile with GCC 10 or later for the CUDA backend if the kernel functor is templated #1472

fwyzard · 2021-11-17T17:43:45Z

Looks like a change in GCC 10 breaks the compilation for the CUDA backend of kernels written with Alpaka, if the operator() is templated on its arguments other than TAcc.

Here is an extract of the code:

namespace cms {
  namespace alpakatools {

     struct countFromVector {
      template <typename T_Acc, typename Histo, typename T>
      ALPAKA_FN_ACC void operator()(const T_Acc &acc,
                                    Histo *__restrict__ h,
                                    uint32_t nh,
                                    T const *__restrict__ v,
                                    uint32_t const *__restrict__ offsets) const {
        ...
      }
    };

    template <typename Histo, typename T>
    ALPAKA_FN_HOST ALPAKA_FN_INLINE __attribute__((always_inline)) void fillManyFromVector(
        Histo *__restrict__ h,
        uint32_t nh,
        T const *__restrict__ v,
        uint32_t const *__restrict__ offsets,
        uint32_t totSize,
        unsigned int nthreads,
        ::ALPAKA_ACCELERATOR_NAMESPACE::Queue &queue) {
      launchZero(h, queue);

      ...

      alpaka::enqueue(queue,
                      alpaka::createTaskKernel<::ALPAKA_ACCELERATOR_NAMESPACE::Acc1D>(
                          workDiv, countFromVector(), h, nh, v, offsets));
      ...
    }

  }  // namespace alpakatools
}  // namespace cms

When used later on in some application code, e.g.

::cms::alpakatools::fillManyFromVector(hits_d.phiBinner(), 10, hits_d.c_iphi(), hits_d.c_hitsLayerStart(), nHits, 256, queue);

this results in the error

.../external/alpaka/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp: In instantiation of ‘static void alpaka::traits::Enqueue<alpaka::QueueUniformCudaHipRtNonBlocking, alpaka::TaskKernelGpuUniformCudaHipRt<TAcc, TDim, TIdx, TKernelFnObj, TArgs ...>, void>::enqueue(alpaka::QueueUniformCudaHipRtNonBlocking&, const alpaka::TaskKernelGpuUniformCudaHipRt<TAcc, TDim, TIdx, TKernelFnObj, TArgs ...>&) [with TAcc = alpaka::AccGpuCudaRt<std::integral_constant<long unsigned int, 1>, unsigned int>; TDim = std::integral_constant<long unsigned int, 1>; TIdx = unsigned int; TKernelFnObj = cms::alpakatools::countFromVector; TArgs = {cms::alpakatools::HistoContainer<short int, 128, 49152, 16, short unsigned int, 10>* __restrict__&, unsigned int&, const short int* __restrict__&, const unsigned int* __restrict__&}]’:
.../external/alpaka/include/alpaka/queue/Traits.hpp:48:58:   required from ‘void alpaka::enqueue(TQueue&, TTask&&) [with TQueue = alpaka::QueueUniformCudaHipRtNonBlocking; TTask = alpaka::TaskKernelGpuUniformCudaHipRt<alpaka::AccGpuCudaRt<std::integral_constant<long unsigned int, 1>, unsigned int>, std::integral_constant<long unsigned int, 1>, unsigned int, cms::alpakatools::countFromVector, cms::alpakatools::HistoContainer<short int, 128, 49152, 16, short unsigned int, 10>* __restrict__&, unsigned int&, const short int* __restrict__&, const unsigned int* __restrict__&>]’
.../src/alpaka/AlpakaCore/HistoContainer.h:113:16:   required from ‘void cms::alpakatools::fillManyFromVector(Histo*, uint32_t, const T*, const uint32_t*, uint32_t, unsigned int, alpaka_cuda_async::Queue&) [with Histo = cms::alpakatools::HistoContainer<short int, 128, 49152, 16, short unsigned int, 10>; T = short int; uint32_t = unsigned int; alpaka_cuda_async::Queue = alpaka::QueueUniformCudaHipRtNonBlocking]’
.../src/alpaka/plugin-SiPixelRecHits/alpaka/PixelRecHits.cc:72:125:   required from here
.../external/alpaka/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp:243:6: error: no matches converting function ‘uniformCudaHipKernel’ to type ‘void (*)(class alpaka::Vec<std::integral_constant<long unsigned int, 1>, unsigned int>, struct cms::alpakatools::countFromVector, class cms::alpakatools::HistoContainer<short int, 128, 49152, 16, short unsigned int, 10>*, unsigned int, const short int*, const unsigned int*)’
  243 |                 auto kernelName = uniform_cuda_hip::detail::
      |      ^         
.../external/alpaka/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp:74:1: note: candidate is: ‘template<class TAcc, class TDim, class TIdx, class TKernelFnObj, class ... TArgs> void alpaka::uniform_cuda_hip::detail::uniformCudaHipKernel(alpaka::Vec<TDim, TVal>, TKernelFnObj, TArgs ...)’
   74 |             __global__ void uniformCudaHipKernel(
      | ^           ~~~~~~~~

The errors happens with GCC 10.2 and 11.2; the same code compiles fine with GCC 9.

To reproduce:

git clone [email protected]:cms-patatrack/pixeltrack-standalone.git
cd pixeltrack-standalone
git checkout 2e46f69b77d8e0f4c1afd2b316e6b6b5dd884243
# edit the Makefile to point to a recent version of CUDA (default is /usr/local/cuda) and g++ from GCC 10
make alpaka

After having set up the environment and built the dependeicies, a simpler way to reproduce the problem is

nvcc -x cu  \
  -std=c++17  -O3  \
  -DALPAKA_ACC_GPU_CUDA_ENABLED  \
  -DALPAKA_ACC_GPU_CUDA_ONLY_MODE  \
  --expt-relaxed-constexpr --expt-extended-lambda  \
  -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored  \
  --diag-suppress 20014 \
  --cudart=shared  \
  -gencode arch=compute_75,code=[sm_75,compute_75]  \
  -ccbin g++-10  \
  --compiler-options '-fPIC -fdiagnostics-show-option -felide-constructors -fmessage-length=0 -fno-math-errno -ftree-vectorize -fvisibility-inlines-hidden --param vect-max-version-for-alias-checks=50 -msse3 -pipe -pthread'  \
  -DGPU_DEBUG  \
  -Isrc/alpaka  \
  -DSRC_DIR=src/alpaka  \
  -DLIB_DIR=lib/alpaka  \
  -isystem external/alpaka/include  \
  -isystem external/boost/include  \
  -isystem external/libbacktrace/include  \
  -dc \
  -c src/alpaka/test/alpaka/HistoContainer_t.cc  \
  -o HistoContainer_t.cc.cuda.o

The text was updated successfully, but these errors were encountered:

fwyzard · 2021-11-17T17:44:35Z

The HEAD of our master branch uses Alpaka 0.8.0-rc1, but the same error is there also with Alpaka 0.7.0 .

fwyzard · 2021-11-18T07:52:43Z

After going through plenty of intermediate code generated by nvcc and preprocessed code generated by g++, I now suspect that the issue may be related to template arguments with the __restrict__ attribute.

Looking at the code snippets above, the kernel has the signature

template <typename T_Acc, typename Histo, typename T> ALPAKA_FN_ACC void struct countFromVector::operator()(
    const T_Acc &acc,
    Histo *__restrict__ h,
    uint32_t nh,
    T const *__restrict__ v,
    uint32_t const *__restrict__ offsets) const;

but the compiler is trying to cast the function that calls it to a function pointer of type (here T = short int)

void (*)(class alpaka::Vec<std::integral_constant<long unsigned int, 1>, unsigned int>,
    struct cms::alpakatools::countFromVector,
    class cms::alpakatools::HistoContainer<short int, 128, 49152, 16, short unsigned int, 10>*,
    unsigned int,
    const short int*,
    const unsigned int*)’

without the __restrict__ attributes.

fwyzard · 2021-11-18T08:29:24Z

I found a much simpler way to reproduce the problem, modifying the vectorAdd example to exhibit the same issue:

diff --git a/example/vectorAdd/src/vectorAdd.cpp b/example/vectorAdd/src/vectorAdd.cpp
index 3b5de083a72d..07dcc547f66a 100644
--- a/example/vectorAdd/src/vectorAdd.cpp
+++ b/example/vectorAdd/src/vectorAdd.cpp
@@ -163,12 +163,15 @@ auto main() -> int
     VectorAddKernel kernel;
 
     // Create the kernel execution task.
+    Data const* __restrict__ const pBufAccA(alpaka::getPtrNative(bufAccA));
+    Data const* __restrict__ const pBufAccB(alpaka::getPtrNative(bufAccB));
+    Data* __restrict__ const pBufAccC(alpaka::getPtrNative(bufAccC));
     auto const taskKernel = alpaka::createTaskKernel<Acc>(
         workDiv,
         kernel,
-        alpaka::getPtrNative(bufAccA),
-        alpaka::getPtrNative(bufAccB),
-        alpaka::getPtrNative(bufAccC),
+        pBufAccA,
+        pBufAccB,
+        pBufAccC,
         numElements);
 
     // Enqueue the kernel execution task

The modification compile fine with GCC 9:

$ g++ --version
g++ (GCC) 9.2.1 20191120 (Red Hat 9.2.1-2)
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ nvcc -std=c++17 -O3 -x cu -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored --expt-relaxed-constexpr --expt-extended-lambda --generate-line-info --source-in-ptx -Iinclude -DALPAKA_ACC_GPU_CUDA_ENABLED -DALPAKA_ACC_GPU_CUDA_ONLY_MODE -DALPAKA_DEBUG=ALPAKA_DEBUG_FULL example/vectorAdd/src/vectorAdd.cpp -o vectorAdd

but fail with GCC 10:

$ g++ --version
g++ (GCC) 10.2.1 20201112 (Red Hat 10.2.1-8)
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ nvcc -std=c++17 -O3 -x cu -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored --expt-relaxed-constexpr --expt-extended-lambda --generate-line-info --source-in-ptx -Iinclude -DALPAKA_ACC_GPU_CUDA_ENABLED -DALPAKA_ACC_GPU_CUDA_ONLY_MODE -DALPAKA_DEBUG=ALPAKA_DEBUG_FULL example/vectorAdd/src/vectorAdd.cpp -o vectorAdd
include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp: In instantiation of ‘static void alpaka::traits::Enqueue<alpaka::QueueUniformCudaHipRtBlocking, alpaka::TaskKernelGpuUniformCudaHipRt<TAcc, TDim, TIdx, TKernelFnObj, TArgs ...>, void>::enqueue(alpaka::QueueUniformCudaHipRtBlocking&, const alpaka::TaskKernelGpuUniformCudaHipRt<TAcc, TDim, TIdx, TKernelFnObj, TArgs ...>&) [with TAcc = alpaka::AccGpuCudaRt<std::integral_constant<long unsigned int, 1>, long unsigned int>; TDim = std::integral_constant<long unsigned int, 1>; TIdx = long unsigned int; TKernelFnObj = VectorAddKernel; TArgs = {const unsigned int* const __restrict__&, const unsigned int* const __restrict__&, unsigned int* const __restrict__&, const long unsigned int&}]’:
include/alpaka/queue/Traits.hpp:48:58:   required from ‘void alpaka::enqueue(TQueue&, TTask&&) [with TQueue = alpaka::QueueUniformCudaHipRtBlocking; TTask = const alpaka::TaskKernelGpuUniformCudaHipRt<alpaka::AccGpuCudaRt<std::integral_constant<long unsigned int, 1>, long unsigned int>, std::integral_constant<long unsigned int, 1>, long unsigned int, VectorAddKernel, const unsigned int* const __restrict__&, const unsigned int* const __restrict__&, unsigned int* const __restrict__&, const long unsigned int&>&]’
example/vectorAdd/src/vectorAdd.cpp:180:34:   required from here
include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp:374:6: error: no matches converting function ‘uniformCudaHipKernel’ to type ‘void (*)(class alpaka::Vec<std::integral_constant<long unsigned int, 1>, long unsigned int>, struct VectorAddKernel, const unsigned int*, const unsigned int*, unsigned int*, long unsigned int)’
  374 |                 auto kernelName = uniform_cuda_hip::detail::
      |      ^         
include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp:74:1: note: candidate is: ‘template<class TAcc, class TDim, class TIdx, class TKernelFnObj, class ... TArgs> void alpaka::uniform_cuda_hip::detail::uniformCudaHipKernel(alpaka::Vec<TDim, TVal>, TKernelFnObj, TArgs ...)’
   74 |             __global__ void uniformCudaHipKernel(
      | ^           ~~~~~~~~

j-stephan · 2021-11-18T11:33:02Z

Thanks for reporting this. Which CUDA version are you using here?

psychocoderHPC · 2021-11-18T13:35:58Z

@fwyzard I can reproduce the error with your mini example but currently do not understand why it is failing.

psychocoderHPC · 2021-11-18T14:10:00Z

If I change the alpaka code to be explicit and except only the signature for the vector add all is working.

diff --git a/example/vectorAdd/src/vectorAdd.cpp b/example/vectorAdd/src/vectorAdd.cpp
index 3b5de083a7..7434805164 100644
--- a/example/vectorAdd/src/vectorAdd.cpp
+++ b/example/vectorAdd/src/vectorAdd.cpp
@@ -163,13 +163,10 @@ auto main() -> int
     VectorAddKernel kernel;
 
     // Create the kernel execution task.
-    auto const taskKernel = alpaka::createTaskKernel<Acc>(
-        workDiv,
-        kernel,
-        alpaka::getPtrNative(bufAccA),
-        alpaka::getPtrNative(bufAccB),
-        alpaka::getPtrNative(bufAccC),
-        numElements);
+    Data const *  const    pBufAccA(alpaka::getPtrNative(bufAccA));
+    Data const*   const  pBufAccB(alpaka::getPtrNative(bufAccB));
+    Data*  __restrict__ const  pBufAccC(alpaka::getPtrNative(bufAccC));
+    auto const taskKernel = alpaka::createTaskKernel<Acc>(workDiv, kernel, pBufAccA, pBufAccB, pBufAccC, numElements);
 
     // Enqueue the kernel execution task
     {
diff --git a/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp b/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp
index 6af0b62bfe..42632db89f 100644
--- a/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp
+++ b/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp
@@ -74,7 +74,11 @@ namespace alpaka
             __global__ void uniformCudaHipKernel(
                 Vec<TDim, TIdx> const threadElemExtent,
                 TKernelFnObj const kernelFnObj,
+#if 0
                 TArgs... args)
+#else
+                uint32_t const * const a,uint32_t const * const b,uint32_t* const  c,size_t s)
+#endif
             {
 #    if BOOST_ARCH_PTX && (BOOST_ARCH_PTX < BOOST_VERSION_NUMBER(2, 0, 0))
 #        error "Device capability >= 2.0 is required!"
@@ -83,12 +87,11 @@ namespace alpaka
                 const TAcc acc(threadElemExtent);
 
 // with clang it is not possible to query std::result_of for a pure device lambda created on the host side
-#    if !(BOOST_COMP_CLANG_CUDA && BOOST_COMP_CLANG)
-                static_assert(
-                    std::is_same<decltype(kernelFnObj(const_cast<TAcc const&>(acc), args...)), void>::value,
-                    "The TKernelFnObj is required to return void!");
-#    endif
+#if 0
                 kernelFnObj(const_cast<TAcc const&>(acc), args...);
+#else
+                kernelFnObj(const_cast<TAcc const&>(acc), a,b,c,s);
+#endif
             }
 
             template<typename TDim, typename TIdx>

IMO the error is not because __restrict__ and normal pointers did not match in their signature. Do I oversee something in my diff?

psychocoderHPC · 2021-11-18T14:13:18Z

@fwyzard Is there a benefit to declare the pointer __restrict__ on the host side before the kernel is called? Is it not enough to mark the parameters of the kernel (functor) with __restrict__?

psychocoderHPC · 2021-11-18T14:20:52Z

I wrote my own remove_restrict and patched alpaka and the mini app is working.

diff --git a/example/vectorAdd/src/vectorAdd.cpp b/example/vectorAdd/src/vectorAdd.cpp
index 3b5de083a7..7434805164 100644
--- a/example/vectorAdd/src/vectorAdd.cpp
+++ b/example/vectorAdd/src/vectorAdd.cpp
@@ -163,13 +163,10 @@ auto main() -> int
     VectorAddKernel kernel;
 
     // Create the kernel execution task.
-    auto const taskKernel = alpaka::createTaskKernel<Acc>(
-        workDiv,
-        kernel,
-        alpaka::getPtrNative(bufAccA),
-        alpaka::getPtrNative(bufAccB),
-        alpaka::getPtrNative(bufAccC),
-        numElements);
+    Data const *  const    pBufAccA(alpaka::getPtrNative(bufAccA));
+    Data const*   const  pBufAccB(alpaka::getPtrNative(bufAccB));
+    Data*  __restrict__ const  pBufAccC(alpaka::getPtrNative(bufAccC));
+    auto const taskKernel = alpaka::createTaskKernel<Acc>(workDiv, kernel, pBufAccA, pBufAccB, pBufAccC, numElements);
 
     // Enqueue the kernel execution task
     {
diff --git a/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp b/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp
index 6af0b62bfe..ee98ba31b3 100644
--- a/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp
+++ b/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp
@@ -66,6 +66,18 @@ namespace alpaka
 {
     namespace uniform_cuda_hip
     {
+        template<typename A>
+        struct remove_restrict
+        {
+            using type = A;
+        };
+
+        template<typename A>
+        struct remove_restrict<A __restrict__>
+        {
+            using type = A;
+        };
+
         namespace detail
         {
             //! The GPU CUDA/HIP kernel entry point.
@@ -74,7 +86,7 @@ namespace alpaka
             __global__ void uniformCudaHipKernel(
                 Vec<TDim, TIdx> const threadElemExtent,
                 TKernelFnObj const kernelFnObj,
-                TArgs... args)
+                typename remove_restrict<TArgs>::type... args)
             {
 #    if BOOST_ARCH_PTX && (BOOST_ARCH_PTX < BOOST_VERSION_NUMBER(2, 0, 0))
 #        error "Device capability >= 2.0 is required!"

I will provide an PR.

fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

fwyzard · 2021-11-18T14:44:03Z

@fwyzard Is there a benefit to declare the pointer __restrict__ on the host side before the kernel is called?

Probably not :-)
Checking if it makes any difference is still on my to do list ...

fwyzard · 2021-11-18T14:44:34Z

I can confirm the #1474 fixes the compilation problem with GCC 10.

fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

fix #1472 Provide a type trait to remove __restrict__ from a type.

Fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

fix #1472 Provide a type trait to remove __restrict__ from a type.

j-stephan added Backend:CUDA Type:Bug Type:Question labels Nov 18, 2021

psychocoderHPC added a commit to psychocoderHPC/alpaka that referenced this issue Nov 18, 2021

add type trait 'remove_restrict'

80b1e13

fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

psychocoderHPC mentioned this issue Nov 18, 2021

add type trait 'remove_restrict' #1474

Merged

fwyzard pushed a commit to cms-patatrack/alpaka that referenced this issue Nov 19, 2021

add type trait 'remove_restrict'

ced5980

fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

fwyzard pushed a commit to cms-patatrack/alpaka that referenced this issue Nov 23, 2021

add type trait 'remove_restrict'

fb5ad46

fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

psychocoderHPC added a commit to psychocoderHPC/alpaka that referenced this issue Nov 24, 2021

add type trait 'remove_restrict'

ab9c10f

fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

psychocoderHPC added a commit to psychocoderHPC/alpaka that referenced this issue Nov 25, 2021

add type trait 'remove_restrict'

418f09b

fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

psychocoderHPC added a commit to psychocoderHPC/alpaka that referenced this issue Nov 25, 2021

add type trait 'remove_restrict'

32599d4

fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

psychocoderHPC added a commit to psychocoderHPC/alpaka that referenced this issue Nov 26, 2021

add type trait 'remove_restrict'

14f53e7

fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

psychocoderHPC added a commit to psychocoderHPC/alpaka that referenced this issue Nov 26, 2021

add type trait 'remove_restrict'

79a68ae

fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

psychocoderHPC added a commit to psychocoderHPC/alpaka that referenced this issue Nov 26, 2021

add type trait 'remove_restrict'

ef8815a

fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

psychocoderHPC added a commit to psychocoderHPC/alpaka that referenced this issue Nov 26, 2021

add type trait 'remove_restrict'

398d7a6

fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

psychocoderHPC added a commit to psychocoderHPC/alpaka that referenced this issue Nov 29, 2021

add type trait 'remove_restrict'

cffcc92

fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

j-stephan closed this as completed in #1474 Nov 29, 2021

j-stephan pushed a commit that referenced this issue Nov 29, 2021

add type trait 'remove_restrict'

e1308c8

fix #1472 Provide a type trait to remove __restrict__ from a type.

fwyzard pushed a commit to cms-patatrack/alpaka that referenced this issue Dec 5, 2021

Add type trait 'remove_restrict'

3c63806

Fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

j-stephan pushed a commit to j-stephan/alpaka that referenced this issue Dec 7, 2021

add type trait 'remove_restrict'

54b2005

fix alpaka-group#1472 Provide a type trait to remove __restrict__ from a type.

psychocoderHPC added a commit that referenced this issue Dec 8, 2021

add type trait 'remove_restrict'

8e4313d

fix #1472 Provide a type trait to remove __restrict__ from a type.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot compile with GCC 10 or later for the CUDA backend if the kernel functor is templated #1472

Cannot compile with GCC 10 or later for the CUDA backend if the kernel functor is templated #1472

fwyzard commented Nov 17, 2021 •

edited

Loading

fwyzard commented Nov 17, 2021

fwyzard commented Nov 18, 2021 •

edited

Loading

fwyzard commented Nov 18, 2021

j-stephan commented Nov 18, 2021

psychocoderHPC commented Nov 18, 2021

psychocoderHPC commented Nov 18, 2021

psychocoderHPC commented Nov 18, 2021

psychocoderHPC commented Nov 18, 2021

fwyzard commented Nov 18, 2021

fwyzard commented Nov 18, 2021

Cannot compile with GCC 10 or later for the CUDA backend if the kernel functor is templated #1472

Cannot compile with GCC 10 or later for the CUDA backend if the kernel functor is templated #1472

Comments

fwyzard commented Nov 17, 2021 • edited Loading

fwyzard commented Nov 17, 2021

fwyzard commented Nov 18, 2021 • edited Loading

fwyzard commented Nov 18, 2021

j-stephan commented Nov 18, 2021

psychocoderHPC commented Nov 18, 2021

psychocoderHPC commented Nov 18, 2021

psychocoderHPC commented Nov 18, 2021

psychocoderHPC commented Nov 18, 2021

fwyzard commented Nov 18, 2021

fwyzard commented Nov 18, 2021

fwyzard commented Nov 17, 2021 •

edited

Loading

fwyzard commented Nov 18, 2021 •

edited

Loading