Segfault / Memory error with 65B model (128GB RAM) #12

turbo · 2023-03-11T11:14:41Z

On an M1 Ultra / 128GB, running the 65B model:

./main -m ./models/65B/ggml-model-q4_0.bin -t 8 -n 128 -p "The word empowerment has five possible definitions:"

produces this error after everything has been loaded correctly:

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 268478672, available 268435456)

30B runs fine (even on a 64GB M1 Max)

Full output

(base) ➜  llama.cpp git:(master) ✗ ./main -m ./models/65B/ggml-model-q4_0.bin -t 8 -n 128 -p "The word empowerment has five possible definitions:"
main: seed = 1678533057
llama_model_load: loading model from './models/65B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 8192
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 64
llama_model_load: n_layer = 80
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 22016
llama_model_load: n_parts = 8
llama_model_load: ggml ctx size = 41477.73 MB
llama_model_load: memory_size =  2560.00 MB, n_mem = 40960
llama_model_load: loading model part 1/8 from './models/65B/ggml-model-q4_0.bin'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 2/8 from './models/65B/ggml-model-q4_0.bin.1'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 3/8 from './models/65B/ggml-model-q4_0.bin.2'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 4/8 from './models/65B/ggml-model-q4_0.bin.3'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 5/8 from './models/65B/ggml-model-q4_0.bin.4'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 6/8 from './models/65B/ggml-model-q4_0.bin.5'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 7/8 from './models/65B/ggml-model-q4_0.bin.6'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 8/8 from './models/65B/ggml-model-q4_0.bin.7'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723

main: prompt: 'The word empowerment has five possible definitions:'
main: number of tokens in prompt = 11
   1 -> ''
1576 -> 'The'
1734 -> ' word'
3710 -> ' emp'
1680 -> 'ower'
 358 -> 'ment'
 756 -> ' has'
5320 -> ' five'
1950 -> ' possible'
15848 -> ' definitions'
29901 -> ':'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000


ggml_new_tensor_impl: not enough space in the context's memory pool (needed 268478672, available 268435456)
[1]    54172 segmentation fault  ./main -m ./models/65B/ggml-model-q4_0.bin -t 8 -n 128 -p

The text was updated successfully, but these errors were encountered:

ggerganov · 2023-03-11T11:15:57Z

This was fixed here: 7d9ed7b

Just pull, run make and it should be good

turbo · 2023-03-11T11:18:20Z

Nice, that worked 🥳

main: mem per token = 71159620 bytes
main:     load time = 15499.67 ms
main:   sample time =   277.16 ms
main:  predict time = 39353.45 ms / 285.17 ms per token
main:    total time = 56266.87 ms

Fix Makefile and Linux/MacOS CI

Clear logit bias between requests.

* add TLDR and hw support * enrich features section * update model weights * minor on README commands * minor on features * Update README.md --------- Co-authored-by: Holden <[email protected]>

Signed-off-by: Jared Van Bortel <[email protected]>

turbo closed this as completed Mar 11, 2023

cannin mentioned this issue Mar 12, 2023

Segmentation Fault Error "not enough space in the context's memory pool" #52

Closed

gjmulder added the build Compilation issues label Mar 15, 2023

Hades32 pushed a commit to Hades32/llama.cpp that referenced this issue Mar 21, 2023

Merge pull request ggerganov#12 from anzz1/ci_test

71d0978

Fix Makefile and Linux/MacOS CI

SlyEcho pushed a commit to SlyEcho/llama.cpp that referenced this issue Jun 2, 2023

Merge pull request ggerganov#12 from anon998/clear-logit-bias

d29b6d5

Clear logit bias between requests.

goerch mentioned this issue Oct 23, 2023

Add more tokenizer tests #3742

Merged

6 tasks

nasawyer7 mentioned this issue Jan 3, 2024

CUDA error: invalid device function when compiling and running for amd gfx 1032 #4762

Closed

cebtenzzre added a commit that referenced this issue Jan 17, 2024

kompute : ignore exceptions in ggml_vk_available_devices (#12)

02b9baf

Signed-off-by: Jared Van Bortel <[email protected]>

cebtenzzre added a commit that referenced this issue Jan 24, 2024

kompute : ignore exceptions in ggml_vk_available_devices (#12)

76474a7

Signed-off-by: Jared Van Bortel <[email protected]>

schmorp mentioned this issue Apr 7, 2024

GGML_ASSERT: llama.cpp/ggml-cuda/argsort.cu:48: (ncols & (ncols - 1)) == 0 #6527

Closed

slaren mentioned this issue Aug 15, 2024

Threadpool: take 2 #8672

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault / Memory error with 65B model (128GB RAM) #12

Segfault / Memory error with 65B model (128GB RAM) #12

turbo commented Mar 11, 2023

ggerganov commented Mar 11, 2023

turbo commented Mar 11, 2023

Segfault / Memory error with 65B model (128GB RAM) #12

Segfault / Memory error with 65B model (128GB RAM) #12

Comments

turbo commented Mar 11, 2023

ggerganov commented Mar 11, 2023

turbo commented Mar 11, 2023