Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace calls to cudaDeviceSynchronize with calls to only synchronize the default CUDA stream #12687

Conversation

m-mead
Copy link

@m-mead m-mead commented Feb 19, 2024

Overview

The CUDA Programming Guide states that calling cudaDeviceSynchronize has the following synchronization behavior: "cudaDeviceSynchronize() waits until all preceding commands in all streams of all host threads have completed."
(source: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#explicit-synchronization). The RealSense SDK uses cudaDeviceSynchronize in the alignment filter and color conversion functions. This can cause those functions to wait for CUDA code running on other streams to complete. This issue will arise if the user is running CUDA code in separate CUDA streams in their application.

A solution to the problem is to replace calls to cudaDeviceSynchronize with cudaStreamSynchronize(0), where 0 stands for the default CUDA stream. This will change the SDK behavior to only wait for the default stream to synchronize, rather than all streams executing CUDA code. Please let me know if there is a different approach the project would like to take, and I can help contribute.

Related issues

I initially opened and discussed this change on #12680. This PR would resolve that issue.

The CUDA Programming Guide states that calling `cudaDeviceSynchronize`
has the following synchronization behavior: "cudaDeviceSynchronize()
waits until all preceding commands in all streams of all
host threads have completed."
(source: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#explicit-synchronization)

The user may be running CUDA code in separate CUDA streams in their
application and the cudaDeviceSynchronize call will wait for those operations
to finish if they are executing concurrently. A solution to the problem,
the one taken in this commit, is to replace calls to
`cudaDeviceSynchronize` with `cudaStreamSynchronize(0)`, where `0`
stands for the default CUDA stream. This will change the SDK behavior to
only wait for the default stream to synchronize, rather than all
streams executing CUDA code.

Signed-off-by: Michael Mead <[email protected]>
@Nir-Az
Copy link
Collaborator

Nir-Az commented Feb 20, 2024

Hi @m-mead , thank you for your contribution.
We will investigate the issue and verify your proposed change.
It may take some time so please be patient :)

@m-mead
Copy link
Author

m-mead commented Feb 20, 2024

Hi @Nir-Az, sounds good -- thank you! 👍

Copy link
Contributor

@Arun-Prasad-V Arun-Prasad-V left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@Nir-Az Nir-Az merged commit 4ee22d9 into IntelRealSense:development Mar 4, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants