[Q] Memory Requirements for Different Model Sizes #13

NightMachinery · 2023-03-11T12:19:07Z

No description provided.

satyajitghana · 2023-03-11T14:59:58Z

7B (4-bit): 4.14 GB MEM
65B (4-bit): 38 GB MEM

prusnak · 2023-03-11T16:45:08Z

Since the original models are using FP16 and llama.cpp quantizes to 4-bit, the memory requirements are around 4 times smaller than the original:

7B => ~4 GB
13B => ~8 GB
30B => ~16 GB
65B => ~32 GB

cannin · 2023-03-12T03:07:41Z

With an M1 Max 64GB with 4-bit

65B: 38.5GB, 850 ms per token
30B: 19.5GB, 450 ms per token
13B: 7.8GB, 150 ms per token
7B: 4.0GB, 75 ms per token

dbddv01 · 2023-03-15T17:22:26Z

For the record, Intel® Core™ i5-7600K CPU @ 3.80GHz × 4, 16Gb ram, under Ubuntu, model 13B runs with acceptable response time. Note that as mentioned by previous comments, -t 4 parameter gives the best results.
main: mem per token = 22357508 bytes
main: load time = 83076.67 ms
main: sample time = 267.12 ms
main: predict time = 193441.61 ms / 367.76 ms per token
main: total time = 277980.41 ms

Great work !

ggerganov · 2023-03-15T20:37:24Z

Should add these to readme

sinanisler · 2023-03-18T08:49:12Z

@prusnak is that pc ram or gpu vram ?

prusnak · 2023-03-18T09:07:00Z

@prusnak is that pc ram or gpu vram ?

llama.cpp runs on cpu not gpu, so it's the pc ram

whitepapercg · 2023-03-18T09:19:04Z

@prusnak is that pc ram or gpu vram ?

llama.cpp runs on cpu not gpu, so it's the pc ram

Is it possible that at some point we will get a video card version?

prusnak · 2023-03-18T09:27:59Z

Is it possible that at some point we will get a video card version?

I don' think so. You can use run the original Whisper model on a GPU: https://github.com/openai/whisper

mrpher · 2023-03-18T10:22:52Z

Fwiw, running on my M2 Macbook Air w 8GB of ram comes to a grinding halt. At first run about 2-3 minutes of completely unresponsive machine (mouse and keyboard locked), then about 10-20 seconds per response word. Didn't expect great response times, but thats a bit slower than anticipated.

Edit: using 7B model

prusnak · 2023-03-18T10:26:00Z

M2 Macbook Air w 8GB

Close every other app, ideally reboot to clean state. This should help. If you see unresponsive machine, then it is swapping memory to disk. 8GB is not that much, especially if you have Browsers, Slack, etc. running.

j-f1 · 2023-03-18T11:20:55Z

Also make sure you’re using 4 threads instead of 8 — you don’t want to be using any of the 4 efficiency cores.

mrpher · 2023-03-18T17:48:33Z

Working well now, good recommendations @prusnak @j-f1 thank you

prusnak · 2023-03-18T21:02:00Z

Requirements added in #269

SpeedyCraftah · 2023-03-21T20:34:17Z

Since the original models are using FP16 and llama.cpp quantizes to 4-bit, the memory requirements are around 4 times smaller than the original:

7B => ~4 GB

13B => ~8 GB

30B => ~16 GB

64 => ~32 GB

32gb is probably a little too optimistic, I have DDR4 32gb clocked at 3600mhz and it generates each token every 2 minutes.

prusnak · 2023-03-21T21:01:13Z

32gb is probably a little too optimistic

Yeah, 38.5 GB is more realistic.

See https://github.com/ggerganov/llama.cpp#memorydisk-requirements for current values

SpeedyCraftah · 2023-03-21T21:07:09Z

32gb is probably a little too optimistic

Yeah, 38.5 GB is more realistic.

See https://github.com/ggerganov/llama.cpp#memorydisk-requirements for current values

I see. That makes more sense since you mention the whole model is loaded into memory as of now. Linux would probably run better in this case from the better swap handling and lower memory usage.

Thanks!

Yitzhokchaim · 2023-03-31T22:48:40Z

What languages does it work with? Does it work in the same output and input languages GPT?

we actually build a dylib on macos

Small fixes.

ggerganov added question Further information is requested documentation Improvements or additions to documentation labels Mar 12, 2023

prusnak closed this as completed Mar 18, 2023

mishav78 mentioned this issue Mar 22, 2023

memory requirements cocktailpeanut/dalai#227

Open

Nocturna22 mentioned this issue Mar 30, 2023

Alpaca Model - Does nothing at all.. Required Hardware? cocktailpeanut/dalai#330

Open

abetlen pushed a commit to abetlen/llama.cpp that referenced this issue Apr 10, 2023

Merge pull request ggerganov#13 from pixelkaiser/rwkv-macos

77e1998

we actually build a dylib on macos

SlyEcho pushed a commit to SlyEcho/llama.cpp that referenced this issue Jun 2, 2023

Merge pull request ggerganov#13 from anon998/small-fixes

bebea65

Small fixes.

windmaple mentioned this issue Jul 4, 2023

crash when opening the app shixiangcap/llama-jni#1

Open

schmorp mentioned this issue Apr 7, 2024

GGML_ASSERT: llama.cpp/ggml-cuda/argsort.cu:48: (ncols & (ncols - 1)) == 0 #6527

Closed

slaren mentioned this issue Aug 15, 2024

Threadpool: take 2 #8672

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Q] Memory Requirements for Different Model Sizes #13

[Q] Memory Requirements for Different Model Sizes #13

NightMachinery commented Mar 11, 2023

satyajitghana commented Mar 11, 2023 •

edited

Loading

prusnak commented Mar 11, 2023 •

edited

Loading

cannin commented Mar 12, 2023 •

edited

Loading

dbddv01 commented Mar 15, 2023

ggerganov commented Mar 15, 2023

sinanisler commented Mar 18, 2023

prusnak commented Mar 18, 2023

whitepapercg commented Mar 18, 2023

prusnak commented Mar 18, 2023

mrpher commented Mar 18, 2023 •

edited

Loading

prusnak commented Mar 18, 2023 •

edited

Loading

j-f1 commented Mar 18, 2023

mrpher commented Mar 18, 2023

prusnak commented Mar 18, 2023

SpeedyCraftah commented Mar 21, 2023

prusnak commented Mar 21, 2023 •

edited

Loading

SpeedyCraftah commented Mar 21, 2023

Yitzhokchaim commented Mar 31, 2023

[Q] Memory Requirements for Different Model Sizes #13

[Q] Memory Requirements for Different Model Sizes #13

Comments

NightMachinery commented Mar 11, 2023

satyajitghana commented Mar 11, 2023 • edited Loading

prusnak commented Mar 11, 2023 • edited Loading

cannin commented Mar 12, 2023 • edited Loading

dbddv01 commented Mar 15, 2023

ggerganov commented Mar 15, 2023

sinanisler commented Mar 18, 2023

prusnak commented Mar 18, 2023

whitepapercg commented Mar 18, 2023

prusnak commented Mar 18, 2023

mrpher commented Mar 18, 2023 • edited Loading

prusnak commented Mar 18, 2023 • edited Loading

j-f1 commented Mar 18, 2023

mrpher commented Mar 18, 2023

prusnak commented Mar 18, 2023

SpeedyCraftah commented Mar 21, 2023

prusnak commented Mar 21, 2023 • edited Loading

SpeedyCraftah commented Mar 21, 2023

Yitzhokchaim commented Mar 31, 2023

satyajitghana commented Mar 11, 2023 •

edited

Loading

prusnak commented Mar 11, 2023 •

edited

Loading

cannin commented Mar 12, 2023 •

edited

Loading

mrpher commented Mar 18, 2023 •

edited

Loading

prusnak commented Mar 18, 2023 •

edited

Loading

prusnak commented Mar 21, 2023 •

edited

Loading