how to release gpu memory when keep onnxruntime session around. #9509

Z-XQ · 2021-10-23T05:17:42Z

I want to release GPU memory in time and keep the session running. Thank you!

hariharans29 · 2021-10-25T03:02:48Z

Can you please elaborate about your scenario ? What exactly do you mean by "release GPU memory in time and keep the session running" ? Do you mean you want to shrink any GPU memory arena associated with the session periodically while still keeping the session alive ?

Z-XQ · 2021-10-25T05:45:01Z

Can you please elaborate about your scenario ? What exactly do you mean by "release GPU memory in time and keep the session running" ? Do you mean you want to shrink any GPU memory arena associated with the session periodically while still keeping the session alive ?

Thank you for your reply! The following is my scenario.

My gpu is 3090. 708M gpu memory is used before open an onnxruntime session.
Then I use the following to open a session.
ort_session = onnxruntime.InferenceSession(model_path)
The gpu memory becomes used about 1.7g

When infers one image as following, the gpu memory becomes used about 2.0g. And the number will not decline when the infererence operation is over.
seg_raw_output = tmp.run([], {self.seg_input_name: seg_input_data})[0]

Therefore, I want to release some gpu memory occupation to make the occupation return to 1.7g while still keeping the session alive.

Thank you! I am looking forward to your reply!

hariharans29 · 2021-10-26T04:23:37Z

Thanks @Z-XQ for the explanation.

The GPU memory is backed by a memory pool (arena) and we have a config knob to shrink the arena (de-allocated unused memory chunks).

Not sure if we have enough tools to accomplish this in Python just yet. The best way to use this feature in C++ is to:

Not allocate weights memory through the arena: See here
Configure the arena to have high enough initial chunk to support most Run() calls. See "initial_chunk_size_bytes" here
Finally, configure the arena to shrink on every Run(). See here. This will keep the initial chunk allocated but de-allocate any unused chunk remaining after the Run() call ends.

For example, if the initial chunk size is set as 500MB, the first Run() will allocate 500MB + any additional chunks required to service the Run() call. The additional chunks will get de-allocated after Run() and only keep 500MB of memory allocated. It is important to not allocate weights (initializers) memory through the arena as that complicates the shrinkage. Hence, step (1).

Z-XQ · 2021-10-26T05:53:37Z

Thanks @Z-XQ for the explanation.

The GPU memory is backed by a memory pool (arena) and we have a config knob to shrink the arena (de-allocated unused memory chunks).

Not sure if we have enough tools to accomplish this in Python just yet. The best way to use this feature in C++ is to:

Not allocate weights memory through the arena: See here

Configure the arena to have high enough initial chunk to support most Run() calls. See "initial_chunk_size_bytes" here

Finally, configure the arena to shrink on every Run(). See here. This will keep the initial chunk allocated but de-allocate any unused chunk remaining after the Run() call ends.

For example, if the initial chunk size is set as 500MB, the first Run() will allocate 500MB + any additional chunks required to service the Run() call. The additional chunks will get de-allocated after Run() and only keep 500MB of memory allocated. It is important to not allocate weights (initializers) memory through the arena as that complicates the shrinkage. Hence, step (1).

Thanks a lot! It's too hard to convert to c++ deployment in a short time. I'll figure out other ways by using python code.

hariharans29 · 2021-10-26T06:45:47Z

Thanks. We will need to support configuring the arena in Python. So, I will mark it an enhancement.

zlbdzhh · 2022-05-11T10:17:23Z

When I operated according to what you said, I reported this error in multi gpu
"Did not find an arena based allocator registered for device-id combination in the memory arena shrink list: gpu:1",how can
i do it

codender · 2022-05-24T14:57:21Z

Thanks. We will need to support configuring the arena in Python. So, I will mark it an enhancement.

Hi, can i release gpu mem in python now?

chinmayjog13 · 2023-02-03T01:19:47Z

Hi, any update on this issue?

yingfenging · 2023-07-19T10:51:50Z

@hariharans29 Hi, can i release gpu mem in python now?

bluddy · 2023-11-06T08:14:30Z

I'd like to know if there is a way to limit the growth of memory usage, especially on the GPU.
We're making serious usage of onnxruntime and need to know if we can rely on it in a python-based system.

dawenxi-only · 2024-09-08T03:29:33Z

Is there any update? I am also facing the issue of freeing memory from onnxruntime in Python

yiluzhuimeng · 2024-10-11T06:35:05Z

Is there any update? I am also facing the issue of freeing memory from onnxruntime in Python，too

hariharans29 added the ep:CUDA issues related to the CUDA execution provider label Oct 25, 2021

Z-XQ closed this as completed Oct 25, 2021

Z-XQ reopened this Oct 25, 2021

hariharans29 added the feature request request for unsupported feature or enhancement label Oct 26, 2021

veelion mentioned this issue Aug 26, 2022

not work for OrtCUDAProviderOptionsV2 to release GPU memory in C++ #12748

Open

This was referenced Oct 26, 2022

Arena on GPU throw Segmentation fault in C++ #13458

Closed

GPU Arena blocked session->Run() #13464

Open

jywu-msft mentioned this issue Nov 2, 2022

[Performance] #13492

Open

pranavsharma mentioned this issue Nov 4, 2022

[Performance] #13500

Open

Hiroshiba mentioned this issue Dec 2, 2022

onnxruntimeのSessionを解放してGPUメモリをリフレッシュできる仕組みを作る VOICEVOX/voicevox_core#337

Closed

zeruniverse mentioned this issue Jan 19, 2023

Expose session.use_device_allocator_for_initializers in onnxruntime_backend to completely shrink arena triton-inference-server/onnxruntime_backend#166

Open

nullhd2 mentioned this issue Feb 23, 2023

[Feature Request] #14796

Open

masashi4359 mentioned this issue Apr 25, 2023

[Performance] GPU memory usage increased and overflowed. onnx/onnx#5107

Closed

Gourieff mentioned this issue Jan 4, 2024

VRAM leak issue Gourieff/sd-webui-reactor#307

Open

3 tasks

hariharans29 mentioned this issue Feb 16, 2024

[Performance] How to free GPU memory for transformers ONNX models #19445

Open

hariharans29 mentioned this issue Oct 25, 2024

[Performance] High CUDA memory usage with ONNX Runtime and inconsistent memory release #22297

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to release gpu memory when keep onnxruntime session around. #9509

how to release gpu memory when keep onnxruntime session around. #9509

Z-XQ commented Oct 23, 2021

hariharans29 commented Oct 25, 2021

Z-XQ commented Oct 25, 2021

hariharans29 commented Oct 26, 2021

Z-XQ commented Oct 26, 2021

hariharans29 commented Oct 26, 2021

zlbdzhh commented May 11, 2022

codender commented May 24, 2022

chinmayjog13 commented Feb 3, 2023

yingfenging commented Jul 19, 2023

bluddy commented Nov 6, 2023 •

edited

Loading

dawenxi-only commented Sep 8, 2024

yiluzhuimeng commented Oct 11, 2024

how to release gpu memory when keep onnxruntime session around. #9509

how to release gpu memory when keep onnxruntime session around. #9509

Comments

Z-XQ commented Oct 23, 2021

hariharans29 commented Oct 25, 2021

Z-XQ commented Oct 25, 2021

hariharans29 commented Oct 26, 2021

Z-XQ commented Oct 26, 2021

hariharans29 commented Oct 26, 2021

zlbdzhh commented May 11, 2022

codender commented May 24, 2022

chinmayjog13 commented Feb 3, 2023

yingfenging commented Jul 19, 2023

bluddy commented Nov 6, 2023 • edited Loading

dawenxi-only commented Sep 8, 2024

yiluzhuimeng commented Oct 11, 2024

bluddy commented Nov 6, 2023 •

edited

Loading