[Performance] #13500

igormis · 2022-10-29T08:16:26Z

Describe the issue

GPU RAM gets exhausted during inference (after a certain number of calls), i.e. it keeps increasing randomly

To reproduce

Reproduction instructions

sess_options = rt.SessionOptions()
sess_options.enable_mem_pattern = False
sess_options.enable_cpu_mem_arena = False
cuda_provider_options = {"arena_extend_strategy": "kSameAsRequested"}
execution_providers = [("CUDAExecutionProvider", cuda_provider_options), "CPUExecutionProvider"]
sess = rt.InferenceSession("roberta_onnx_model/__MODEL_PROTO.onnx",sess_options, providers=execution_providers)
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name
logits = sess.run([label_name], {input_name: X_test_sample})[0]
...

Urgency

No response

Platform

Linux

OS Version

20.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnxruntime-gpu==1.11.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4

Model File

The models is tensorflow model (RoBERTa)

Is this a quantized model?

Unknown

The text was updated successfully, but these errors were encountered:

yuslepukhin · 2022-10-31T17:34:39Z

https://onnxruntime.ai/docs/performance/tune-performance.html#convolution-heavy-models-and-the-cuda-ep

igormis · 2022-10-31T23:00:41Z

Thanks @yuslepukhin.
It is still increasing, for instance I have started with 3 models:

and after certain number of requests the RAM gets exhausted:

igormis · 2022-11-01T14:45:34Z

I have tries the following code also:
cuda_provider_options = {"arena_extend_strategy": "kSameAsRequested", "gpu_mem_limit": 3221225472, "cudnn_conv_use_max_workspace": "1"} but it still grows

pranavsharma · 2022-11-04T17:45:13Z

The arena allocator used by the CUDA EP doesn't shrink by itself. You can configure it to shrink after every Run. See the Memory arena shrinkage section here https://onnxruntime.ai/docs/get-started/with-c.html. Also read #9509 (comment).

igormis · 2022-11-05T21:02:46Z

@pranavsharma tnx a lot, it is perfect direction. I have these settings for EP in Python:

sess_options = rt.SessionOptions()

sess_options.enable_mem_pattern = False
sess_options.enable_cpu_mem_arena = False

cuda_provider_options = {"arena_extend_strategy": "kSameAsRequested", "cudnn_conv_use_max_workspace": "1"}
cpu_provider_options = {"arena_extend_strategy": "kSameAsRequested"}
execution_providers = [("CUDAExecutionProvider", cuda_provider_options), "CPUExecutionProvider"]
sess = rt.InferenceSession("roberta_onnx_model/__MODEL_PROTO.onnx", providers=execution_providers)

So does this mean that for the RunOptions I need to do something like this:

run_options = rt.RunOptions()
run_options.add_run_config_entry("kOrtRunOptionsConfigEnableMemoryArenaShrinkage", "gpu:0")
logits = sess.run([label_name], {input_name: X_test_sample}, run_options)[0]

@pranavsharma is this correct? I read the documentation, but still I'm a little bit confused

pranavsharma · 2022-11-08T20:35:05Z

Yeah, that sounds right. Please look at the test code as well.

onnxruntime/onnxruntime/test/shared_lib/test_inference.cc

Line 1904 in 2ecd1d6

TEST(CApiTest, ConfigureCudaArenaAndDemonstrateMemoryArenaShrinkage) {

igormis · 2022-11-28T09:56:42Z

The settings reduced the memory growth, but still it increases after certain time, is it possible to make it somehow fixed?

igormis · 2022-11-28T10:00:43Z

After playing with the code and following the code that you have send me, this are the final settings:

sess_options = rt.SessionOptions()



cuda_provider_options = {"arena_extend_strategy": "kNextPowerOfTwo", "do_copy_in_default_stream": False}
cpu_provider_options = {"arena_extend_strategy": "kNextPowerOfTwo", "do_copy_in_default_stream": False}

execution_providers = [("CUDAExecutionProvider", cuda_provider_options), ("CPUExecutionProvider", cpu_provider_options)]
sess = rt.InferenceSession("roberta_onnx_model/__MODEL_PROTO.onnx", providers=execution_providers)

run_options = rt.RunOptions()
run_options.add_run_config_entry("kOrtRunOptionsConfigEnableMemoryArenaShrinkage", "cpu:0,gpu:0")

logits = sess.run([label_name], {input_name: X_test_sample}, run_options)[0]

DO not know what else to set/change

bluddy · 2023-11-06T08:14:12Z

I'd like to know if this really limits the growth of memory usage, especially on the GPU.
We're making serious usage of onnxruntime and need to know if we can rely on it in a python-based system.

niyathimariya · 2024-10-09T03:44:22Z

Hi, is there any update on this issue?

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Oct 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] #13500

[Performance] #13500

igormis commented Oct 29, 2022

yuslepukhin commented Oct 31, 2022

igormis commented Oct 31, 2022 •

edited

Loading

igormis commented Nov 1, 2022

pranavsharma commented Nov 4, 2022 •

edited

Loading

igormis commented Nov 5, 2022 •

edited

Loading

pranavsharma commented Nov 8, 2022

igormis commented Nov 28, 2022

igormis commented Nov 28, 2022 •

edited

Loading

bluddy commented Nov 6, 2023

niyathimariya commented Oct 9, 2024

[Performance] #13500

[Performance] #13500

Comments

igormis commented Oct 29, 2022

Describe the issue

To reproduce

Reproduction instructions

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

yuslepukhin commented Oct 31, 2022

igormis commented Oct 31, 2022 • edited Loading

igormis commented Nov 1, 2022

pranavsharma commented Nov 4, 2022 • edited Loading

igormis commented Nov 5, 2022 • edited Loading

pranavsharma commented Nov 8, 2022

igormis commented Nov 28, 2022

igormis commented Nov 28, 2022 • edited Loading

bluddy commented Nov 6, 2023

niyathimariya commented Oct 9, 2024

igormis commented Oct 31, 2022 •

edited

Loading

pranavsharma commented Nov 4, 2022 •

edited

Loading

igormis commented Nov 5, 2022 •

edited

Loading

igormis commented Nov 28, 2022 •

edited

Loading