You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Realsense SDK uses cudaDeviceSynchronize to synchronize GPU operations. This takes place in the color conversion functions and alignment filter. The issue with using cudaDeviceSynchronize is that it will wait for all operations on all streams to complete: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#explicit-synchronization. From my understanding of the code, it isn't necessary for the Realsense SDK to wait on all streams to complete -- but rather just the one on which the filtering operations are executing. Please correct me if I am wrong. 🙂
The user may be running CUDA code in separate CUDA streams in their application and the cudaDeviceSynchronize call will wait for those operations to finish if they are executing concurrently. A solution to this problem would be to either place CUDA operations in the Realsense SDK on a separate stream, or use cudaStreamSynchronize with an argument of 0 to only synchronize the default stream which is used by the Realsense SDK. Either solution would allow SDK CUDA operations to not block until other streams complete. The latter is simpler to implement and would not change the stream users expect the Realsense SDK to use.
I am happy to help contribute the changes if the Realsense team is interested; I searched for similar issues and could not find related issues.
The text was updated successfully, but these errors were encountered:
Hi @m-mead The only blocking taking place in the RealSense SDK that I am aware of is if the WaitForFrames() instruction is used in a script, as described at #2422 (comment)
You are very welcome to submit a Pull Request (PR) so that your CUDA changes can be considered by the RealSense development team for inclusion in the SDK.
Hi @MartyG-RealSense, thanks for the response! I submitted a Pull Request with the CUDA changes via #12687
That is good to know about WaitForFrames. The way that the filter functions that call cudaDeviceSynchronize could block unnecessarily long would be if the user is running CUDA code concurrently in other CUDA streams in the application.
You are very welcome. I have added an Enhancement label to this issue to signify that it should be kept open whilst your Pull Request is active. Thanks again!
Issue Description
The Realsense SDK uses
cudaDeviceSynchronize
to synchronize GPU operations. This takes place in the color conversion functions and alignment filter. The issue with usingcudaDeviceSynchronize
is that it will wait for all operations on all streams to complete: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#explicit-synchronization. From my understanding of the code, it isn't necessary for the Realsense SDK to wait on all streams to complete -- but rather just the one on which the filtering operations are executing. Please correct me if I am wrong. 🙂The user may be running CUDA code in separate CUDA streams in their application and the
cudaDeviceSynchronize
call will wait for those operations to finish if they are executing concurrently. A solution to this problem would be to either place CUDA operations in the Realsense SDK on a separate stream, or usecudaStreamSynchronize
with an argument of 0 to only synchronize the default stream which is used by the Realsense SDK. Either solution would allow SDK CUDA operations to not block until other streams complete. The latter is simpler to implement and would not change the stream users expect the Realsense SDK to use.I am happy to help contribute the changes if the Realsense team is interested; I searched for similar issues and could not find related issues.
The text was updated successfully, but these errors were encountered: