cudaDeviceSynchronize used in SDK filters requires all CUDA streams to complete #12680

m-mead · 2024-02-18T19:02:07Z

Required Info
Camera Model	D400
Firmware Version	N/a
Operating System & Version	Linux, Windows
Kernel Version (Linux Only)	All
Platform	All
SDK Version	2.54.2
Language	C and C++
Segment

Issue Description

The Realsense SDK uses cudaDeviceSynchronize to synchronize GPU operations. This takes place in the color conversion functions and alignment filter. The issue with using cudaDeviceSynchronize is that it will wait for all operations on all streams to complete: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#explicit-synchronization. From my understanding of the code, it isn't necessary for the Realsense SDK to wait on all streams to complete -- but rather just the one on which the filtering operations are executing. Please correct me if I am wrong. 🙂

The user may be running CUDA code in separate CUDA streams in their application and the cudaDeviceSynchronize call will wait for those operations to finish if they are executing concurrently. A solution to this problem would be to either place CUDA operations in the Realsense SDK on a separate stream, or use cudaStreamSynchronize with an argument of 0 to only synchronize the default stream which is used by the Realsense SDK. Either solution would allow SDK CUDA operations to not block until other streams complete. The latter is simpler to implement and would not change the stream users expect the Realsense SDK to use.

I am happy to help contribute the changes if the Realsense team is interested; I searched for similar issues and could not find related issues.

The text was updated successfully, but these errors were encountered:

MartyG-RealSense · 2024-02-19T12:34:51Z

Hi @m-mead The only blocking taking place in the RealSense SDK that I am aware of is if the WaitForFrames() instruction is used in a script, as described at #2422 (comment)

You are very welcome to submit a Pull Request (PR) so that your CUDA changes can be considered by the RealSense development team for inclusion in the SDK.

https://github.com/IntelRealSense/librealsense/pulls

m-mead · 2024-02-19T18:54:52Z

Hi @MartyG-RealSense, thanks for the response! I submitted a Pull Request with the CUDA changes via #12687

That is good to know about WaitForFrames. The way that the filter functions that call cudaDeviceSynchronize could block unnecessarily long would be if the user is running CUDA code concurrently in other CUDA streams in the application.

MartyG-RealSense · 2024-02-20T10:01:32Z

You are very welcome. I have added an Enhancement label to this issue to signify that it should be kept open whilst your Pull Request is active. Thanks again!

m-mead · 2024-03-04T19:45:19Z

The associated pull request (#12687) has been merged so this issue is now resolved.

Arun-Prasad-V · 2024-03-05T03:42:31Z

@m-mead, Thanks for your contribution.

MartyG-RealSense added D400 Series Linux Windows labels Feb 19, 2024

m-mead mentioned this issue Feb 19, 2024

Replace calls to cudaDeviceSynchronize with calls to only synchronize the default CUDA stream #12687

Merged

MartyG-RealSense added the enhancement label Feb 20, 2024

m-mead closed this as completed Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cudaDeviceSynchronize used in SDK filters requires all CUDA streams to complete #12680

cudaDeviceSynchronize used in SDK filters requires all CUDA streams to complete #12680

m-mead commented Feb 18, 2024

MartyG-RealSense commented Feb 19, 2024

m-mead commented Feb 19, 2024

MartyG-RealSense commented Feb 20, 2024

m-mead commented Mar 4, 2024

Arun-Prasad-V commented Mar 5, 2024

cudaDeviceSynchronize used in SDK filters requires all CUDA streams to complete #12680

cudaDeviceSynchronize used in SDK filters requires all CUDA streams to complete #12680

Comments

m-mead commented Feb 18, 2024

Issue Description

MartyG-RealSense commented Feb 19, 2024

m-mead commented Feb 19, 2024

MartyG-RealSense commented Feb 20, 2024

m-mead commented Mar 4, 2024

Arun-Prasad-V commented Mar 5, 2024