-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] #13500
Comments
Thanks @yuslepukhin. |
I have tries the following code also: |
The arena allocator used by the CUDA EP doesn't shrink by itself. You can configure it to shrink after every Run. See the |
@pranavsharma tnx a lot, it is perfect direction. I have these settings for EP in Python:
So does this mean that for the RunOptions I need to do something like this:
@pranavsharma is this correct? I read the documentation, but still I'm a little bit confused |
Yeah, that sounds right. Please look at the test code as well.
|
The settings reduced the memory growth, but still it increases after certain time, is it possible to make it somehow fixed? |
After playing with the code and following the code that you have send me, this are the final settings:
DO not know what else to set/change |
I'd like to know if this really limits the growth of memory usage, especially on the GPU. |
Hi, is there any update on this issue? |
Describe the issue
GPU RAM gets exhausted during inference (after a certain number of calls), i.e. it keeps increasing randomly
To reproduce
Reproduction instructions
Urgency
No response
Platform
Linux
OS Version
20.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
onnxruntime-gpu==1.11.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4
Model File
The models is tensorflow model (RoBERTa)
Is this a quantized model?
Unknown
The text was updated successfully, but these errors were encountered: