-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CUDA gradient script #806
Fix CUDA gradient script #806
Conversation
It would be also safer to add checks for CUDA errors when calling CUDA functions, but the final comparison of the GPU and CPU results will hint on a need for debugging, so I believe there's no need to "clutter" the code further. |
@@ -15,6 +15,7 @@ | |||
// XFAIL: clang-15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we understand why it fails for clang-15?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first guess would be that the local CUDA version where this was tested and failed was not compatible with clang-15. I will have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I haven't found a CUDA version old enough (prior to 11.5) and compatible with my system (Ubuntu 22.04) to test this with clang-15, but upon examining another issue I came across a similar failure of clang-15 and clang-14 (didn't test with clang-16). Relevant comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. We probably should open a new issue and a separate PR to continue the XFAIL: clang-15
discussion there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can open one if you'd like.
I was also thinking that if the CI was expanded to include CUDA's set up, we could check the build errors in the PR (harder for development but for older versions of clang it would be beneficial for developers with the latest OS versions). This could be a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
test/CUDA/GradientCuda.cu
Outdated
@@ -109,14 +110,24 @@ int main(void) { | |||
cudaMemcpy(d_x, x, N * sizeof(double), cudaMemcpyHostToDevice); | |||
cudaMalloc(&d_p, N * sizeof(double)); | |||
cudaMemcpy(d_p, p, N * sizeof(double), cudaMemcpyHostToDevice); | |||
double *result, *d_result; | |||
std::vector<double> result(N, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably do not need a std::vector
here. Can we use a fixed size array or an std::array
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. See my comment.
@@ -15,6 +15,7 @@ | |||
// XFAIL: clang-15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. We probably should open a new issue and a separate PR to continue the XFAIL: clang-15
discussion there.
Fixes #744.
As described here, the gradient computation of the function of interest was moved inside the kernel. The results returned from the GPU were compared with the ones derived from the CPU execution to establish the correctness of the implementation.