Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaDeviceSynchronize used in SDK filters requires all CUDA streams to complete #12680

Closed
m-mead opened this issue Feb 18, 2024 · 5 comments
Closed

Comments

@m-mead
Copy link

m-mead commented Feb 18, 2024


Required Info
Camera Model D400
Firmware Version N/a
Operating System & Version Linux, Windows
Kernel Version (Linux Only) All
Platform All
SDK Version 2.54.2
Language C and C++
Segment

Issue Description

The Realsense SDK uses cudaDeviceSynchronize to synchronize GPU operations. This takes place in the color conversion functions and alignment filter. The issue with using cudaDeviceSynchronize is that it will wait for all operations on all streams to complete: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#explicit-synchronization. From my understanding of the code, it isn't necessary for the Realsense SDK to wait on all streams to complete -- but rather just the one on which the filtering operations are executing. Please correct me if I am wrong. 🙂

The user may be running CUDA code in separate CUDA streams in their application and the cudaDeviceSynchronize call will wait for those operations to finish if they are executing concurrently. A solution to this problem would be to either place CUDA operations in the Realsense SDK on a separate stream, or use cudaStreamSynchronize with an argument of 0 to only synchronize the default stream which is used by the Realsense SDK. Either solution would allow SDK CUDA operations to not block until other streams complete. The latter is simpler to implement and would not change the stream users expect the Realsense SDK to use.

I am happy to help contribute the changes if the Realsense team is interested; I searched for similar issues and could not find related issues.

@MartyG-RealSense
Copy link
Collaborator

Hi @m-mead The only blocking taking place in the RealSense SDK that I am aware of is if the WaitForFrames() instruction is used in a script, as described at #2422 (comment)

You are very welcome to submit a Pull Request (PR) so that your CUDA changes can be considered by the RealSense development team for inclusion in the SDK.

https://github.com/IntelRealSense/librealsense/pulls

@m-mead
Copy link
Author

m-mead commented Feb 19, 2024

Hi @MartyG-RealSense, thanks for the response! I submitted a Pull Request with the CUDA changes via #12687

That is good to know about WaitForFrames. The way that the filter functions that call cudaDeviceSynchronize could block unnecessarily long would be if the user is running CUDA code concurrently in other CUDA streams in the application.

@MartyG-RealSense
Copy link
Collaborator

You are very welcome. I have added an Enhancement label to this issue to signify that it should be kept open whilst your Pull Request is active. Thanks again!

@m-mead
Copy link
Author

m-mead commented Mar 4, 2024

The associated pull request (#12687) has been merged so this issue is now resolved.

@m-mead m-mead closed this as completed Mar 4, 2024
@Arun-Prasad-V
Copy link
Contributor

@m-mead, Thanks for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants