You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
key words: GPU model, memory, C++
Describe: when I use ORT 1.12.1 in linux and nvidia T4, I try to run inference one by one, and then stop to watch GPU memory. I find the GPU memory is number A when the model loaded, then after run it raise to B, but long time to run and stop, the GPU memory is not decrease to A,is it right? as the gpu memory seems not be released. By the way, the model I use is float16.
To reproduce
as describe.
Urgency
No response
Platform
Linux
OS Version
ubuntu 14.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.12.1
ONNX Runtime API
C++
Architecture
X86
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.2
Model File
No response
Is this a quantized model?
Yes
The text was updated successfully, but these errors were encountered:
Describe the issue
key words: GPU model, memory, C++
Describe: when I use ORT 1.12.1 in linux and nvidia T4, I try to run inference one by one, and then stop to watch GPU memory. I find the GPU memory is number A when the model loaded, then after run it raise to B, but long time to run and stop, the GPU memory is not decrease to A,is it right? as the gpu memory seems not be released. By the way, the model I use is float16.
To reproduce
as describe.
Urgency
No response
Platform
Linux
OS Version
ubuntu 14.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.12.1
ONNX Runtime API
C++
Architecture
X86
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.2
Model File
No response
Is this a quantized model?
Yes
The text was updated successfully, but these errors were encountered: