You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was running the GradientCuda.cu file locally, I see that when printing the gradients all gradients are 0.00. Furthermore, I see that when executed on the global kernel, the second printf statement is never executed, indicating that the function launch has an error and cannot be executed. This is confirmed when I place a standard cuda error message after the kernel launch <<<>>>
if(cudaSuccess != cudaGetLastError()) {
printf("Error after kernel launch");
}
An error is flagged. The kernel execution code is the same as the GradientCuda.cu test script as shown below.
__global__ void compute(decltype(gauss_g) grad, double* d_x, double* d_p, int n, double* d_result) {
printf("We have entered the Gloabla kernel");
grad.execute(d_x, d_p, 2.0, n, d_result);
printf("We have successfully performed a gradient execution on the GPU");
}
The text was updated successfully, but these errors were encountered:
So, I took a look at this and the problem is that gauss_g is an object located in the host memory and hence, when passed as an argument to the kernel, the device (GPU) can't access it.
The above fact leads us to consider the following options:
clad::gradient is both a device and host function, so we can compute gauss_g inside the kernel. This approach is quite efficient as there is no need to allocate extra device memory and copy data from CPU which slows down performance.
In case we need to give the user the ability to choose from a function pool which one to differentiate, an enum can be passed as an argument to the kernel, which also doesn't require for large amount of data to be allocated
We can expand the dynamic possibilities of reading the function to differentiate from a file only if it can be read at compile time and defined as a host/device function
Pass a pointer of the function to differentiate to the kernel --> clad::gradient cannot handle function pointers, as it searches for argument names.
Pass the gradient object to the kernel --> cannot be done as it contains a pointer to its function that points to a host location and hence, we need to also transfer the function code to GPU without knowing its length.
@vgvassilev@vaithak I would really like to hear your thoughts on the above and any additional ideas you might have. Otherwise, I could open a pull request following the first option which I have already confirmed that works.
Hi @ioanaif !
I was running the GradientCuda.cu file locally, I see that when printing the gradients all gradients are 0.00. Furthermore, I see that when executed on the global kernel, the second printf statement is never executed, indicating that the function launch has an error and cannot be executed. This is confirmed when I place a standard cuda error message after the kernel launch <<<>>>
An error is flagged. The kernel execution code is the same as the GradientCuda.cu test script as shown below.
The text was updated successfully, but these errors were encountered: