-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any tips for keeping frames in GPU when using the python wrapper? #7824
Comments
Hi @smartin015 I researched your question deeply. Accessing the SDK's C++ implementation of GLSL from Python did not seem to be practical. I also investigated the possibility of applying GLSL in Python from outside of the SDK, perhaps through Pyglet (since the Python wrapper has a Pyglet viewer example). An example of Python, Pyglet and GLSL: https://www.pythonstuff.org/glsl/example_2_glsl_with_pyglet.html It may be best though to instead tackle your suspected root cause of your problem: the conversion of SDK frames to numpy. This is a goal that some RealSense Python users have been experimenting with for a while (both with frame to numpy and with numpy to frame). There is not a definitive solution at the time of writing this, though the subject has been referred to Intel to investigate according to @RealSenseSupport . |
Looking around a bit, I found the cuda array interface spec from Numba, which is also used in CuPy, PyTorch, JAX, etc. It sounds as though if realsense frame objects had this standard |
I do not personally have knowledge of the advanced workings of the CUDA support in librealsense, so I cannot offer an educated opinion on that subject. In regard to a data pointer though, librealsense's frames have a frame handle that acts as a 'smart pointer'. Data can be retrieved from the frame handle with get_data() (the instruction that you already mentioned): |
For this use case, get_data -> numpy array -> CUDA is the way to go. I don't see benefits to using ::gl in this case - depth frames originate in main memory, someone has to copy it to GPU memory, GLSL will not do it faster than built-in CUDA methods. |
With the Jetson Nano, there's actually no such thing as main vs GPU memory - the CPU and GPU physically share the same system memory. But if librealsense, numpy, and numba/CUDA don't know about it (which is what I'm assuming), it causes an unnecessary copy to "load the data onto the GPU" which wastes time and space in memory. Efficiently using the system memory requires allocating "mapped" pinned memory. This is apparently done by replacing Ignoring the ability to do any of this in python for the moment... is there a way to explicitly create the frame buffer and configure the pipeline to use it? That would allow me to use |
@smartin015 I must defer again to the CUDA expertise of @dorodnic on this matter. |
I also stumbled across |
@smartin015 I located information from @dorodnic about ENABLE_ZERO_COPY: "Zero Copy feature is for now not functional. The idea was that rs2::frame object could track the underlying Kernel resource instead of making a copy, but this does not always play well with the rest of the SDK. We might re-enable it at some point, but for now there seem to be little need for it". |
Hey Marty - I'm still blocked on what I'm trying to do: use librealsense2 with mapped, pinned memory to eliminate unnecessary copying of depth frames on the NVIDIA Jetson Nano. |
Hi @smartin015 Considering that progress was not able to be made the last time that this issue was looked at, do you wish to continue with it please? Thanks! |
I suppose not. It's unfortunate the realsense library doesn't support this optimization, but I don't know where to start and it sounds like there's no interest in implementing it. I'll go ahead and close the issue. |
One last update for folks who might come across this thread - I managed to speed up my code 10x (!) by following the conversation to add CUDA UVM support to Numba. The trick was in replacing my calls to
and then running my code as normal. So there's still a copy happening to get from librealsense into numba land, but it's apparently not the expensive one(s) that fake loading into and out of GPU memory. |
Thanks so much for sharing your solution @smartin015 :) |
Hey all,
I'm doing some custom CUDA processing using a relasense D435 and an NVIDIA Jetson Nano - I have an example which works here.
I use numba for JIT compilation of the CUDA kernel, and would like to to pass the incoming depth frame to the kernel without first copying out of the GPU. To outline this, it's roughly:
I suspect that my call to
get_data()
to get the data and convert to numpy is causing an extra copy out of GPU memory that could be avoided. I saw #7816 and the:gl
namespace in C++, but that doesn't appear to be accesible in the python wrapper in order to pass the mapped memory into the kernel.Can you please advise? Thanks!
The text was updated successfully, but these errors were encountered: