Shared arena allocator for CUDA #21577

carsonswope · 2024-07-31T17:06:28Z

carsonswope
Jul 31, 2024

Hi, I want to use the shared arena allocator functionality to reduce VRAM requirements when having multiple sessions loaded, when using the CUDA EP.

I am attempting to call the CreateAndRegisterAllocatorV2() function to enable this:

Ort::Env ortEnv(OrtLoggingLevel::ORT_LOGGING_LEVEL_VERBOSE, "ENV", &log_fn, nullptr);
const Ort::MemoryInfo ortEnv_memInfo("Cuda", OrtAllocatorType::OrtDeviceAllocator, 0, OrtMemType::OrtMemTypeDefault);
const Ort::ArenaCfg ortEnv_arenaCfg{
         // all defaults?
         0,  // max_mem
         -1, // arena_extend_strategy
         -1, // initial_chunk_size_bytes
         -1  // max_dead_bytes_per_chunk
      };

ortEnv.CreateAndRegisterAllocatorV2("CUDAExecutionProvider", ortEnv_memInfo, {}, ortEnv_arenaCfg);

Then, I use the same Ort::Env when loading and running multiple sessions. This does not seem to have any impact on the VRAM usage: each successive model that is loaded and executed seems to load its own memory, with or without the above configuration enabled.

So.. what could be going wrong here? Is there something obvious that I'm missing?

I will only run sessions one-at-a-time, so it seems to me that I should only need the amount of memory required by the session that uses the most memory - although maybe that is too optimistic?

Is there some minimum reproducible example available anywhere that shows the expected behavior I'm talking about?

tianleiwu · 2024-08-20T14:51:50Z

tianleiwu
Aug 20, 2024
Collaborator

See Share allocator(s) between sessions in https://onnxruntime.ai/docs/get-started/with-c.html

0 replies

carsonswope · 2024-08-20T18:19:55Z

carsonswope
Aug 20, 2024
Author

Thanks @tianleiwu , I've been looking through those docs and any other possible reference to shared allocation I can find.

The problem is I can't seem to save any CUDA memory when running 2 sessions with a shared allocator. It's really hard to get a clear picture of exactly what the feature is supposed to be do (for example, does it even apply to CUDA, or is it CPU-memory only?) from the docs, so I was hoping that somebody who know how it works might be able to help me out with a few of these questions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared arena allocator for CUDA #21577

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Shared arena allocator for CUDA #21577

carsonswope Jul 31, 2024

Replies: 2 comments

tianleiwu Aug 20, 2024 Collaborator

carsonswope Aug 20, 2024 Author

carsonswope
Jul 31, 2024

tianleiwu
Aug 20, 2024
Collaborator

carsonswope
Aug 20, 2024
Author