You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Getting this error
/home/mbetten/Trilinos/cuda-intrepid-install-opt/include/Cuda/Kokkos_CudaExec.hpp(181):
Error: Formal parameter space overflowed (4096 bytes max) in function ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForI19WeightChargeFunctorNS_10TeamPolicyINS_4CudaEvS5_EEEEEEvT
Christian said
Ok I found it. It is in the new more accurate function to figure out what the best team size etc is. You find it in this file:
kokkos/core/src/Cuda/Kokkos_Cuda_Internal.hpp
If you for now replace all "cuda_parallel_launch_local" with "cuda_parallel_launch_constant" in that file it should work again.
I need to split the functions and make the "Large" check a template parameter, so that not both branches are instantiated for
each functor. Bummer. We also need to add a functor test larger than 4kB to our test suite to catch this the next time.
Christian
The text was updated successfully, but these errors were encountered:
crtrott
added
the
Bug
Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
label
Nov 10, 2015
Fixes an issue with cuda_get_max_block_size and cuda_get_opt_block_size.
This makes the choice of constant vs local memory a template parameter
defaulted by the size of the existing DriverType template parameter.
It also changes the interface by adding a new shmem_extra argument which is
required for lambdas since the functor in those cases doesn't have a
shmem size function.
Both functions are part of the impl namespace and thus not public yet.
Fixes an issue with cuda_get_max_block_size and cuda_get_opt_block_size.
This makes the choice of constant vs local memory a template parameter
defaulted by the size of the existing DriverType template parameter.
It also changes the interface by adding a new shmem_extra argument which is
required for lambdas since the functor in those cases doesn't have a
shmem size function.
Both functions are part of the impl namespace and thus not public yet.
Getting this error
/home/mbetten/Trilinos/cuda-intrepid-install-opt/include/Cuda/Kokkos_CudaExec.hpp(181):
Error: Formal parameter space overflowed (4096 bytes max) in function ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForI19WeightChargeFunctorNS_10TeamPolicyINS_4CudaEvS5_EEEEEEvT
Christian said
Ok I found it. It is in the new more accurate function to figure out what the best team size etc is. You find it in this file:
kokkos/core/src/Cuda/Kokkos_Cuda_Internal.hpp
If you for now replace all "cuda_parallel_launch_local" with "cuda_parallel_launch_constant" in that file it should work again.
I need to split the functions and make the "Large" check a template parameter, so that not both branches are instantiated for
each functor. Bummer. We also need to add a functor test larger than 4kB to our test suite to catch this the next time.
Christian
The text was updated successfully, but these errors were encountered: