Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with hipMemcpyToSymbol #1920

Closed
Rombur opened this issue Mar 9, 2020 · 2 comments
Closed

Problem with hipMemcpyToSymbol #1920

Rombur opened this issue Mar 9, 2020 · 2 comments

Comments

@Rombur
Copy link

Rombur commented Mar 9, 2020

I have this problem when compiling the HIP backend in Kokkos. I've tried to create a small reproducer to show the problem but I couldn't. So here is what the code does: We create a global variable on the host and a global variable on the device. We use hipMalloc to allocate the memory of the host global variable. We then use hipMemcpyToSymbol to copy the pointers to the device global variable. This works fine if the device global variable is not called in another file. If it's the case the data is never copied. To show the problem, I have create a second device global variable that goes through the same code path but the variable is not called in any other file.

===========================================
host pointer 0 0
host pointer 0x2aaab7214000 0x2aaab7294000
device addresses2 (nil) (nil)
symbol size 24
symbol address 0x2aaec1186018
device addresses2 0x2aaab7214000 0x2aaab7294000
I am here 
me too
===========================================
host pointer 0 0
host pointer 0x2aaab7214000 0x2aaab7294000
device addresses (nil) (nil)
symbol size 24
symbol address 0x2aaec116a000
device addresses (nil) (nil)
I am here
Memory access fault by GPU node-4 (Agent handle: 0x13c01e0) on address 0x3000. Reason: Page not present or supervisor privilege.
srun: error: node003: task 0: Aborted

As you see the problem is that the data is never copied.

If in file core/src/HIP/Kokkos_HIP_Parallel_Team.hpp around line 411, you comment the implementation of __device__ inline void operator()(void) const (i.e. comment all the calls the device global variable), everything works fine. Note that this code has not been executed by the time I try to copy the data. Simply being there changes the behavior of the other function.

The relevant code is here, here, here, here, and here

How to reproduce the error:

  • git clone https://github.com/Rombur/kokkos.git
  • cd kokkos
  • git checkout parallel_team_bug
  • mkdir build
  • cmake -DCMAKE_CXX_COMPILER=hipcc -DCMAKE_CXX_FLAGS="-fno-gpu-rdc" -DCAKE_BUILD_TYPE=Debug -DKokkos_ENABLE_HIP=ON -DKokkos_ENABLE_SERIAL=ON -DKokkos_ENABLE_TESTS=ON -DKokkos_ENABLE_EXAMPLES=OFF -DKokkos_ARCH_MI60=ON ..
  • make
  • cd core/unit_test
  • ./KokkosCore_UnitTest_HIP

I am using

HIP version: 3.1.20065-2a03c8da
HCC clang version 10.0.0
Rombur added a commit to Rombur/kokkos that referenced this issue Mar 11, 2020
This is not used for now due to a bug in hipcc (see
ROCm/HIP#1920)
Rombur added a commit to Rombur/kokkos that referenced this issue Mar 11, 2020
This is not used for now due to a bug in hipcc (see
ROCm/HIP#1920)
Rombur added a commit to Rombur/kokkos that referenced this issue Mar 18, 2020
This is not used for now due to a bug in hipcc (see
ROCm/HIP#1920)
Rombur added a commit to Rombur/kokkos that referenced this issue Mar 19, 2020
This is not used for now due to a bug in hipcc (see
ROCm/HIP#1920)
Rombur added a commit to Rombur/kokkos that referenced this issue Mar 25, 2020
This is not used for now due to a bug in hipcc (see
ROCm/HIP#1920)
@ppanchad-amd
Copy link

@Rombur Sorry for the lack of response. Please try latest ROCm 6.0.2 (HIP 6.0.32831) to see if your issue still exists? If resolved, please close the ticket. Thanks.

@Rombur
Copy link
Author

Rombur commented Mar 19, 2024

It has been fixed. Closing this.

@Rombur Rombur closed this as completed Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants