derivative value of 0 when running the CUDA test script #744

stebos100 · 2024-01-31T09:38:05Z

I was running the GradientCuda.cu file locally, I see that when printing the gradients all gradients are 0.00. Furthermore, I see that when executed on the global kernel, the second printf statement is never executed, indicating that the function launch has an error and cannot be executed. This is confirmed when I place a standard cuda error message after the kernel launch <<<>>>

if(cudaSuccess != cudaGetLastError()) {
  printf("Error after kernel launch");
}

An error is flagged. The kernel execution code is the same as the GradientCuda.cu test script as shown below.


__global__ void compute(decltype(gauss_g) grad, double* d_x, double* d_p, int n, double* d_result) {

  printf("We have entered the Gloabla kernel");

  grad.execute(d_x, d_p, 2.0, n, d_result);

  printf("We have successfully performed a gradient execution on the GPU");

}

The text was updated successfully, but these errors were encountered:

kchristin22 · 2024-03-06T18:05:52Z

Hi!

So, I took a look at this and the problem is that gauss_g is an object located in the host memory and hence, when passed as an argument to the kernel, the device (GPU) can't access it.

The above fact leads us to consider the following options:

clad::gradient is both a device and host function, so we can compute gauss_g inside the kernel. This approach is quite efficient as there is no need to allocate extra device memory and copy data from CPU which slows down performance.
- In case we need to give the user the ability to choose from a function pool which one to differentiate, an enum can be passed as an argument to the kernel, which also doesn't require for large amount of data to be allocated
- We can expand the dynamic possibilities of reading the function to differentiate from a file only if it can be read at compile time and defined as a host/device function
Pass a pointer of the function to differentiate to the kernel --> clad::gradient cannot handle function pointers, as it searches for argument names.
Pass the gradient object to the kernel --> cannot be done as it contains a pointer to its function that points to a host location and hence, we need to also transfer the function code to GPU without knowing its length.

@vgvassilev @vaithak I would really like to hear your thoughts on the above and any additional ideas you might have. Otherwise, I could open a pull request following the first option which I have already confirmed that works.

vgvassilev · 2024-03-06T20:30:04Z

Yes, please go ahead with the pull request.

vgvassilev assigned vaithak Feb 12, 2024

vgvassilev added this to the v1.5 milestone Feb 12, 2024

kchristin22 mentioned this issue Mar 7, 2024

Fix CUDA gradient script #806

Merged

vgvassilev closed this as completed in #806 Mar 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

derivative value of 0 when running the CUDA test script #744

derivative value of 0 when running the CUDA test script #744

stebos100 commented Jan 31, 2024

kchristin22 commented Mar 6, 2024

vgvassilev commented Mar 6, 2024

derivative value of 0 when running the CUDA test script #744

derivative value of 0 when running the CUDA test script #744

Comments

stebos100 commented Jan 31, 2024

kchristin22 commented Mar 6, 2024

vgvassilev commented Mar 6, 2024