Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DualView: subview behaves differently in v3.1 than v3.0 with CUDA #2981

Closed
kddevin opened this issue Apr 27, 2020 · 1 comment
Closed

DualView: subview behaves differently in v3.1 than v3.0 with CUDA #2981

kddevin opened this issue Apr 27, 2020 · 1 comment

Comments

@kddevin
Copy link

kddevin commented Apr 27, 2020

The behavior of DualView with CUDA differs in the v3.1 Trilinos snapshot from the previous version in Trilinos. Here's a test that demonstrates the problem.

KokkosDualViewTest.txt

With v3.1 in Trilinos, this test succeeds with Kokkos::Serial in the DualView type, but it produces the following error in sync_host with Kokkos::Cuda.

dualView extent = 3 1
firstBlockView extent = 2 1
firstBlockView syncing to host NOW
firstBlockView syncing to host DONE
secondBlockView extent = 2 1
secondBlockView syncing to host NOW
secondBlockView syncing to host DONE
thirdBlockView extent = 0 1
thirdBlockView syncing to host NOW
terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaPointerGetAttributes(&attr, ptr) error( cudaErrorInvalidValue): invalid argument /ascldap/users/kddevin/code/Trilinos/packages/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp:922
Traceback functionality not available

while with the April 13 version before the snapshot into Trilinos, it produces:

dualView extent = 3 1
firstBlockView extent = 2 1
firstBlockView syncing to host NOW
firstBlockView syncing to host DONE
secondBlockView extent = 2 1
secondBlockView syncing to host NOW
secondBlockView syncing to host DONE
thirdBlockView extent = 0 1
thirdBlockView syncing to host NOW
thirdBlockView syncing to host DONE

I know the thirdBlockView points to the end of the DualView. This feature is useful for accessing blocks of Tpetra::Vectors (e.g., owned vs shared unknowns) in assembly. Tpetra uses it in getting an offsetViewNonConst and offsetView.

The change in behavior causes an error in Empire testing:
trilinos/Trilinos#7234
trilinos/Trilinos#7233

In the Empire use case, one processor does not have shared values, while other processors do have shared values. The processor without shared values needs to point to the end of the DualView.

@rppawlo

@kddevin
Copy link
Author

kddevin commented Apr 27, 2020

I see this issue is likely fixed by #2979
I will close and reopen if the problem persists.

@kddevin kddevin closed this as completed Apr 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant