Replace calls to cudaDeviceSynchronize with calls to only synchronize the default CUDA stream #12687
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
The CUDA Programming Guide states that calling
cudaDeviceSynchronize
has the following synchronization behavior: "cudaDeviceSynchronize() waits until all preceding commands in all streams of all host threads have completed."(source: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#explicit-synchronization). The RealSense SDK uses
cudaDeviceSynchronize
in the alignment filter and color conversion functions. This can cause those functions to wait for CUDA code running on other streams to complete. This issue will arise if the user is running CUDA code in separate CUDA streams in their application.A solution to the problem is to replace calls to
cudaDeviceSynchronize
withcudaStreamSynchronize(0)
, where0
stands for the default CUDA stream. This will change the SDK behavior to only wait for the default stream to synchronize, rather than all streams executing CUDA code. Please let me know if there is a different approach the project would like to take, and I can help contribute.Related issues
I initially opened and discussed this change on #12680. This PR would resolve that issue.