Replace calls to cudaDeviceSynchronize with calls to only synchronize the default CUDA stream #12687

m-mead · 2024-02-19T18:51:37Z

Overview

The CUDA Programming Guide states that calling cudaDeviceSynchronize has the following synchronization behavior: "cudaDeviceSynchronize() waits until all preceding commands in all streams of all host threads have completed."
(source: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#explicit-synchronization). The RealSense SDK uses cudaDeviceSynchronize in the alignment filter and color conversion functions. This can cause those functions to wait for CUDA code running on other streams to complete. This issue will arise if the user is running CUDA code in separate CUDA streams in their application.

A solution to the problem is to replace calls to cudaDeviceSynchronize with cudaStreamSynchronize(0), where 0 stands for the default CUDA stream. This will change the SDK behavior to only wait for the default stream to synchronize, rather than all streams executing CUDA code. Please let me know if there is a different approach the project would like to take, and I can help contribute.

Related issues

I initially opened and discussed this change on #12680. This PR would resolve that issue.

The CUDA Programming Guide states that calling `cudaDeviceSynchronize` has the following synchronization behavior: "cudaDeviceSynchronize() waits until all preceding commands in all streams of all host threads have completed." (source: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#explicit-synchronization) The user may be running CUDA code in separate CUDA streams in their application and the cudaDeviceSynchronize call will wait for those operations to finish if they are executing concurrently. A solution to the problem, the one taken in this commit, is to replace calls to `cudaDeviceSynchronize` with `cudaStreamSynchronize(0)`, where `0` stands for the default CUDA stream. This will change the SDK behavior to only wait for the default stream to synchronize, rather than all streams executing CUDA code. Signed-off-by: Michael Mead <[email protected]>

Nir-Az · 2024-02-20T10:07:25Z

Hi @m-mead , thank you for your contribution.
We will investigate the issue and verify your proposed change.
It may take some time so please be patient :)

m-mead · 2024-02-20T16:57:49Z

Hi @Nir-Az, sounds good -- thank you! 👍

Arun-Prasad-V

LGTM.

m-mead mentioned this pull request Feb 19, 2024

cudaDeviceSynchronize used in SDK filters requires all CUDA streams to complete #12680

Closed

Nir-Az requested a review from Arun-Prasad-V February 20, 2024 10:06

Arun-Prasad-V approved these changes Mar 4, 2024

View reviewed changes

Nir-Az merged commit 4ee22d9 into IntelRealSense:development Mar 4, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace calls to cudaDeviceSynchronize with calls to only synchronize the default CUDA stream #12687

Replace calls to cudaDeviceSynchronize with calls to only synchronize the default CUDA stream #12687

m-mead commented Feb 19, 2024

Nir-Az commented Feb 20, 2024

m-mead commented Feb 20, 2024

Arun-Prasad-V left a comment

Replace calls to cudaDeviceSynchronize with calls to only synchronize the default CUDA stream #12687

Replace calls to cudaDeviceSynchronize with calls to only synchronize the default CUDA stream #12687

Conversation

m-mead commented Feb 19, 2024

Overview

Related issues

Nir-Az commented Feb 20, 2024

m-mead commented Feb 20, 2024

Arun-Prasad-V left a comment

Choose a reason for hiding this comment