Experiencing 2-3 GB GPU memory use increase compared to llama.cpp version a few weeks ago #6909

zsogitbe · 2024-04-25T15:39:49Z

I am wondering what has happened and if we can do something about it? Is this some kind of memory pool which has a bigger size? Can we reduce this size if we want to? I have noticed this issue with a model which was fitting into my GPU before, but it reports now out of memory when I offload all layers to GPU.
@slaren , is it possible that this has something to do with the work you have done recently with managing GPU memory?

Will the selection of the LLAMA_CUDA_F16 option during compilation decrease inference GPU memory use?

slaren · 2024-04-25T15:57:54Z

I don't remember doing anything recently with managing GPU memory. I am not aware of any changes that could cause VRAM usage to increase, try running a bisect to find the commit that introduced the issue.

zsogitbe · 2024-04-25T16:37:44Z

I was just doing that, I had an "old" version on my computer from 1/3/2024 12:39 (I don't know the commit number) and that version uses 11 GB with my model and the latest version uses >13GB.

What I meant above is your work: #6170 (March 20) which is just between 1/3/2024 and today.

github-actions · 2024-06-09T01:07:04Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

zsogitbe added the bug-unconfirmed label Apr 25, 2024

zsogitbe mentioned this issue May 5, 2024

ggml : become thread-safe #3960

Closed

github-actions bot added the stale label May 26, 2024

github-actions bot closed this as completed Jun 9, 2024

zsogitbe mentioned this issue Jun 14, 2024

Bug: CUDA error: out of memory - Phi-3 Mini 128k prompted with 20k+ tokens on 4GB GPU #7885

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiencing 2-3 GB GPU memory use increase compared to llama.cpp version a few weeks ago #6909

Experiencing 2-3 GB GPU memory use increase compared to llama.cpp version a few weeks ago #6909

zsogitbe commented Apr 25, 2024 •

edited

Loading

slaren commented Apr 25, 2024

zsogitbe commented Apr 25, 2024 •

edited

Loading

github-actions bot commented Jun 9, 2024

Experiencing 2-3 GB GPU memory use increase compared to llama.cpp version a few weeks ago #6909

Experiencing 2-3 GB GPU memory use increase compared to llama.cpp version a few weeks ago #6909

Comments

zsogitbe commented Apr 25, 2024 • edited Loading

slaren commented Apr 25, 2024

zsogitbe commented Apr 25, 2024 • edited Loading

github-actions bot commented Jun 9, 2024

zsogitbe commented Apr 25, 2024 •

edited

Loading

zsogitbe commented Apr 25, 2024 •

edited

Loading