Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] SetVals failed on tensors that are created from user pointers #175

Closed
AtomicVar opened this issue May 2, 2022 · 5 comments · Fixed by #176
Closed

[BUG] SetVals failed on tensors that are created from user pointers #175

AtomicVar opened this issue May 2, 2022 · 5 comments · Fixed by #176

Comments

@AtomicVar
Copy link
Contributor

Describe the bug
When using SetVals on tensors that are created from user pointers, a SegmentFault is thrown.

To Reproduce

float* dev_float;
cudaMalloc(&dev_float, sizeof(float) * 6);

auto t = matx::make_tensor<float, 2, matx::non_owning>(dev_float, {2, 3});
t.SetVals({{1, 2, 3}, {4, 5, 6}});
t.Print();

VSCode Debug Error:
image

Expected behavior
The SetVals should work for tensors created from user pointers.

System details (please complete the following information):

  • OS: Ubuntu 20.04
  • CUDA version: CUDA 11.4
  • g++ version: 9.3.0
@luitjens
Copy link
Collaborator

luitjens commented May 2, 2022

I'm not sure this is a bug. You can't call SetVals on non managed memory as it runs on the host. Can you change cudaMalloc to cudaMallocManaged instead?

@cliffburdick
Copy link
Collaborator

there are two options here: we can throw an exception and shut down, or we can detect it's not managed memory and do a cudaMemcpy. I tend to think the latter is better since it should be more consistent.

@cliffburdick
Copy link
Collaborator

@ZJUGuoShuai the issue with passing a device pointer and doing a cudaMemcpy is that you're technically allowed to do something like:

auto a = make_tensor<float>({3,3});
a.SetVals({{0},{1,2}});

In this example we set partial values per row and leave the rest unitialized. Obviously we can't do a bulk cudaMemcpy in that case. The other option is we launch a cudaMemcpy for every single value. This will be extremely inefficient, but users of SetVals are hopefully not doing this inside performance-critical code anyways since it's intended for setting up values at the beginning of the application. I'm inclined to say this method is "best" because it will give the expected results and we can comment that it will be very slow to run.

Either way, this also has to be blocking on stream 0 since we can't make this asynchronous at the moment.

@luitjens
Copy link
Collaborator

luitjens commented May 2, 2022

I think this is just an invalid usage of the APIs and not a bug. Detecting all cases of invalid memory accesses is not a requirement that any library/SDK should take on. We can clean up documentation to make it clear that wrapping existing memory may be limited if it is not addressable from the host and device.

@cliffburdick
Copy link
Collaborator

@ZJUGuoShuai we closed it and print an error now for the reasons above. Please reopen if you have suggestions on how to improve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants