Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

derivative value of 0 when running the CUDA test script #744

Closed
stebos100 opened this issue Jan 31, 2024 · 2 comments · Fixed by #806
Closed

derivative value of 0 when running the CUDA test script #744

stebos100 opened this issue Jan 31, 2024 · 2 comments · Fixed by #806
Assignees
Milestone

Comments

@stebos100
Copy link

Hi @ioanaif !

I was running the GradientCuda.cu file locally, I see that when printing the gradients all gradients are 0.00. Furthermore, I see that when executed on the global kernel, the second printf statement is never executed, indicating that the function launch has an error and cannot be executed. This is confirmed when I place a standard cuda error message after the kernel launch <<<>>>

if(cudaSuccess != cudaGetLastError()) {
  printf("Error after kernel launch");
}

An error is flagged. The kernel execution code is the same as the GradientCuda.cu test script as shown below.


__global__ void compute(decltype(gauss_g) grad, double* d_x, double* d_p, int n, double* d_result) {

  printf("We have entered the Gloabla kernel");

  grad.execute(d_x, d_p, 2.0, n, d_result);

  printf("We have successfully performed a gradient execution on the GPU");

}

@vgvassilev vgvassilev added this to the v1.5 milestone Feb 12, 2024
@kchristin22
Copy link
Collaborator

Hi!

So, I took a look at this and the problem is that gauss_g is an object located in the host memory and hence, when passed as an argument to the kernel, the device (GPU) can't access it.

The above fact leads us to consider the following options:

  • clad::gradient is both a device and host function, so we can compute gauss_g inside the kernel. This approach is quite efficient as there is no need to allocate extra device memory and copy data from CPU which slows down performance.
    • In case we need to give the user the ability to choose from a function pool which one to differentiate, an enum can be passed as an argument to the kernel, which also doesn't require for large amount of data to be allocated
    • We can expand the dynamic possibilities of reading the function to differentiate from a file only if it can be read at compile time and defined as a host/device function
  • Pass a pointer of the function to differentiate to the kernel --> clad::gradient cannot handle function pointers, as it searches for argument names.
  • Pass the gradient object to the kernel --> cannot be done as it contains a pointer to its function that points to a host location and hence, we need to also transfer the function code to GPU without knowing its length.

@vgvassilev @vaithak I would really like to hear your thoughts on the above and any additional ideas you might have. Otherwise, I could open a pull request following the first option which I have already confirmed that works.

@vgvassilev
Copy link
Owner

Yes, please go ahead with the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants