Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add type trait 'remove_restrict' #1474

Conversation

psychocoderHPC
Copy link
Member

fix #1472

Provide a type trait to remove restrict from a type.

@j-stephan
Copy link
Member

Is removing __restrict__ what we want to achieve here? Removing it against the user's explicit wish feels wrong to me.

@fwyzard
Copy link
Contributor

fwyzard commented Nov 18, 2021

Is removing __restrict__ what we want to achieve here? Removing it against the user's explicit wish feels wrong to me.

My impression is that for the CUDA and HIP backends, __restrict__ is not carried over across the call from host to device anyway. The pointers need to be declared as __restrict__ inside the kernel (on the device side).

For CPU backends the __restrict__ on the "host" side could still be meaningful (not sure about useful).

fwyzard
fwyzard previously approved these changes Nov 18, 2021
@j-stephan
Copy link
Member

Got it, thanks!

fwyzard added a commit to cms-patatrack/alpaka that referenced this pull request Nov 19, 2021
fwyzard added a commit to cms-patatrack/alpaka that referenced this pull request Nov 19, 2021
fwyzard added a commit to cms-patatrack/alpaka that referenced this pull request Nov 23, 2021
@psychocoderHPC psychocoderHPC force-pushed the fix-cudaKernelCallWithRestrictedPointers branch from 80b1e13 to ab9c10f Compare November 24, 2021 14:08
@psychocoderHPC psychocoderHPC force-pushed the fix-cudaKernelCallWithRestrictedPointers branch 2 times, most recently from 32599d4 to 14f53e7 Compare November 26, 2021 07:55
@psychocoderHPC psychocoderHPC force-pushed the fix-cudaKernelCallWithRestrictedPointers branch from 14f53e7 to 79a68ae Compare November 26, 2021 09:46
@psychocoderHPC
Copy link
Member Author

psychocoderHPC commented Nov 26, 2021

Windows CI error in all runs with debug enabled:

2021-11-26T11:32:07.5385822Z   D:\a\alpaka\alpaka\build\example\helloWorld>"C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2\bin\nvcc.exe" -gencode=arch=compute_52,code=\"compute_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64" -x cu   -ID:\a\alpaka\alpaka\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\include" -ID:\a\alpaka\alpaka\boost -I"C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2\include"     --keep-dir x64\Debug  -maxrregcount=0  --machine 64 --compile -cudart static --extended-lambda --expt-relaxed-constexpr -lineinfo -Xcudafe=--display_error_number -Xcudafe=--diag_suppress=esa_on_defaulted_function_ignored -std=c++17 -Xcompiler="/EHsc -Zi -Ob0 /openmp" -g  -D_WINDOWS -DALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLED -DALPAKA_ACC_CPU_B_SEQ_T_THREADS_ENABLED -DALPAKA_ACC_CPU_B_OMP2_T_SEQ_ENABLED -DALPAKA_ACC_CPU_B_SEQ_T_OMP2_ENABLED -DALPAKA_ACC_GPU_CUDA_ENABLED -DALPAKA_DEBUG=0 -DALPAKA_DEBUG_OFFLOAD_ASSUME_HOST -DALPAKA_OFFLOAD_MAX_BLOCK_SIZE=256 -DALPAKA_BLOCK_SHARED_DYN_MEMBER_ALLOC_KIB=47 -DALPAKA_CI -D"CMAKE_INTDIR=\"Debug\"" -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -Xcompiler "/EHsc /W1 /nologo /Od /FdhelloWorld.dir\Debug\vc142.pdb /FS /Zi /RTC1 /MDd /GR" -o helloWorld.dir\Debug\helloWorld.obj "D:\a\alpaka\alpaka\example\helloWorld\src\helloWorld.cpp" 
2021-11-26T11:32:13.7723136Z   helloWorld.cpp
2021-11-26T11:32:14.9954951Z C:\Users\runneradmin\AppData\Local\Temp\tmpxft_000007fc_00000000-7_helloWorld.cudafe1.stub.c(27): 
error C2912: explicit specialization 'void alpaka::uniform_cuda_hip::detail::
__wrapper__device_stub_uniformCudaHipKernel<alpaka::AccGpuCudaRt<std::integral_constant<unsigned __int64,3>,unsigned __int64>,std::integral_constant<unsigned __int64,3>,unsigned __int64,HelloWorldKernel>(const _ZN6alpaka3VecISt17integral_constantIyLy3EEyEE &,const HelloWorldKernel &)' 
is not a specialization of a function template 
[D:\a\alpaka\alpaka\build\example\helloWorld\helloWorld.vcxproj]
2021-11-26T11:32:15.0268017Z C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Microsoft\VC\v160\BuildCustomizations\CUDA 11.2.targets(785,9): error MSB3721: The command ""C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2\bin\nvcc.exe" 

@psychocoderHPC psychocoderHPC force-pushed the fix-cudaKernelCallWithRestrictedPointers branch from 79a68ae to ef8815a Compare November 26, 2021 12:53
@psychocoderHPC psychocoderHPC force-pushed the fix-cudaKernelCallWithRestrictedPointers branch from ef8815a to 398d7a6 Compare November 26, 2021 15:44
@psychocoderHPC psychocoderHPC marked this pull request as draft November 26, 2021 20:52
@psychocoderHPC
Copy link
Member Author

Switched to draft. I changed some lines to pass the windows CI. I will run Monday some more tests if all CI jobs passed.

fix alpaka-group#1472

Provide a type trait to remove __restrict__ from a type.
@psychocoderHPC psychocoderHPC force-pushed the fix-cudaKernelCallWithRestrictedPointers branch from 398d7a6 to cffcc92 Compare November 29, 2021 07:04
@psychocoderHPC psychocoderHPC marked this pull request as ready for review November 29, 2021 12:17
@psychocoderHPC
Copy link
Member Author

This PR is now passing the CI and can be reviewed.

@j-stephan j-stephan merged commit e1308c8 into alpaka-group:develop Nov 29, 2021
@psychocoderHPC psychocoderHPC deleted the fix-cudaKernelCallWithRestrictedPointers branch November 30, 2021 09:13
fwyzard added a commit to cms-patatrack/alpaka that referenced this pull request Dec 5, 2021
fwyzard added a commit to cms-patatrack/alpaka that referenced this pull request Dec 21, 2021
fwyzard added a commit to cms-patatrack/alpaka that referenced this pull request Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot compile with GCC 10 or later for the CUDA backend if the kernel functor is templated
5 participants