You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am wondering what has happened and if we can do something about it? Is this some kind of memory pool which has a bigger size? Can we reduce this size if we want to? I have noticed this issue with a model which was fitting into my GPU before, but it reports now out of memory when I offload all layers to GPU. @slaren , is it possible that this has something to do with the work you have done recently with managing GPU memory?
Will the selection of the LLAMA_CUDA_F16 option during compilation decrease inference GPU memory use?
The text was updated successfully, but these errors were encountered:
I don't remember doing anything recently with managing GPU memory. I am not aware of any changes that could cause VRAM usage to increase, try running a bisect to find the commit that introduced the issue.
I was just doing that, I had an "old" version on my computer from 1/3/2024 12:39 (I don't know the commit number) and that version uses 11 GB with my model and the latest version uses >13GB.
What I meant above is your work: #6170 (March 20) which is just between 1/3/2024 and today.
I am wondering what has happened and if we can do something about it? Is this some kind of memory pool which has a bigger size? Can we reduce this size if we want to? I have noticed this issue with a model which was fitting into my GPU before, but it reports now out of memory when I offload all layers to GPU.
@slaren , is it possible that this has something to do with the work you have done recently with managing GPU memory?
Will the selection of the LLAMA_CUDA_F16 option during compilation decrease inference GPU memory use?
The text was updated successfully, but these errors were encountered: