-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] impl and refactor device_scalar::set_value and value, respectively. #167
Conversation
…(T)` to `void set_value(T)`, adjust docs
This change breaks cudf, so I've creating a sister pr here: rapidsai/cudf#3211 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidentally approved last time.
I added async versions of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully we can simplify this a lot based on #169.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to make synchronization explicit.
include/rmm/device_scalar.hpp
Outdated
@@ -92,6 +92,14 @@ class device_scalar { | |||
|
|||
private: | |||
rmm::device_buffer buff{sizeof(T)}; | |||
|
|||
inline void _memcpy(void *dst, const void *src, cudaStream_t stream) const { | |||
auto status = cudaMemcpyAsync(dst, src, sizeof(T), cudaMemcpyDefault, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please either use cudaMemcpy or add a cudaStreamSynchronize() after this. Otherwise it appears to be asynchronous and is therefore ambiguous with the documentation.
I know I said that cudaMemcpyAsync is not async when either src or dst is un-pinned host memory, but even we should make the synchronization explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot specify the stream with cudaMemcpy
, but Jake mentioned specifying the stream may still be beneficial. I'm not sure how to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it's still better to use cudaMemcpyAsync because it let's us sync a stream that's NOT the null stream (synchronizing the NULL stream is the same as cudaDeviceSynchronize(), which is overkill). To proceed, just add in cudaStreamSynchronize(stream) after the cudaMemcpyAsync like you had before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what you need, I believe.
Addresses #166
Depends on #169