Shared arena allocator for CUDA #21577
Replies: 2 comments
-
See |
Beta Was this translation helpful? Give feedback.
-
Thanks @tianleiwu , I've been looking through those docs and any other possible reference to shared allocation I can find. The problem is I can't seem to save any CUDA memory when running 2 sessions with a shared allocator. It's really hard to get a clear picture of exactly what the feature is supposed to be do (for example, does it even apply to CUDA, or is it CPU-memory only?) from the docs, so I was hoping that somebody who know how it works might be able to help me out with a few of these questions. |
Beta Was this translation helpful? Give feedback.
-
Hi, I want to use the shared arena allocator functionality to reduce VRAM requirements when having multiple sessions loaded, when using the CUDA EP.
I am attempting to call the
CreateAndRegisterAllocatorV2()
function to enable this:Then, I use the same
Ort::Env
when loading and running multiple sessions. This does not seem to have any impact on the VRAM usage: each successive model that is loaded and executed seems to load its own memory, with or without the above configuration enabled.So.. what could be going wrong here? Is there something obvious that I'm missing?
I will only run sessions one-at-a-time, so it seems to me that I should only need the amount of memory required by the session that uses the most memory - although maybe that is too optimistic?
Is there some minimum reproducible example available anywhere that shows the expected behavior I'm talking about?
Beta Was this translation helpful? Give feedback.
All reactions