Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault / Memory error with 65B model (128GB RAM) #12

Closed
turbo opened this issue Mar 11, 2023 · 2 comments
Closed

Segfault / Memory error with 65B model (128GB RAM) #12

turbo opened this issue Mar 11, 2023 · 2 comments
Labels
build Compilation issues

Comments

@turbo
Copy link

turbo commented Mar 11, 2023

On an M1 Ultra / 128GB, running the 65B model:

./main -m ./models/65B/ggml-model-q4_0.bin -t 8 -n 128 -p "The word empowerment has five possible definitions:"

produces this error after everything has been loaded correctly:

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 268478672, available 268435456)

30B runs fine (even on a 64GB M1 Max)

Full output
(base) ➜  llama.cpp git:(master) ✗ ./main -m ./models/65B/ggml-model-q4_0.bin -t 8 -n 128 -p "The word empowerment has five possible definitions:"
main: seed = 1678533057
llama_model_load: loading model from './models/65B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 8192
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 64
llama_model_load: n_layer = 80
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 22016
llama_model_load: n_parts = 8
llama_model_load: ggml ctx size = 41477.73 MB
llama_model_load: memory_size =  2560.00 MB, n_mem = 40960
llama_model_load: loading model part 1/8 from './models/65B/ggml-model-q4_0.bin'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 2/8 from './models/65B/ggml-model-q4_0.bin.1'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 3/8 from './models/65B/ggml-model-q4_0.bin.2'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 4/8 from './models/65B/ggml-model-q4_0.bin.3'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 5/8 from './models/65B/ggml-model-q4_0.bin.4'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 6/8 from './models/65B/ggml-model-q4_0.bin.5'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 7/8 from './models/65B/ggml-model-q4_0.bin.6'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 8/8 from './models/65B/ggml-model-q4_0.bin.7'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723

main: prompt: 'The word empowerment has five possible definitions:'
main: number of tokens in prompt = 11
   1 -> ''
1576 -> 'The'
1734 -> ' word'
3710 -> ' emp'
1680 -> 'ower'
 358 -> 'ment'
 756 -> ' has'
5320 -> ' five'
1950 -> ' possible'
15848 -> ' definitions'
29901 -> ':'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000


ggml_new_tensor_impl: not enough space in the context's memory pool (needed 268478672, available 268435456)
[1]    54172 segmentation fault  ./main -m ./models/65B/ggml-model-q4_0.bin -t 8 -n 128 -p
@ggerganov
Copy link
Owner

This was fixed here: 7d9ed7b

Just pull, run make and it should be good

@turbo
Copy link
Author

turbo commented Mar 11, 2023

Nice, that worked 🥳

main: mem per token = 71159620 bytes
main:     load time = 15499.67 ms
main:   sample time =   277.16 ms
main:  predict time = 39353.45 ms / 285.17 ms per token
main:    total time = 56266.87 ms

@turbo turbo closed this as completed Mar 11, 2023
@gjmulder gjmulder added the build Compilation issues label Mar 15, 2023
Hades32 pushed a commit to Hades32/llama.cpp that referenced this issue Mar 21, 2023
Fix Makefile and Linux/MacOS CI
SlyEcho pushed a commit to SlyEcho/llama.cpp that referenced this issue Jun 2, 2023
@goerch goerch mentioned this issue Oct 23, 2023
6 tasks
chsasank pushed a commit to chsasank/llama.cpp that referenced this issue Dec 20, 2023
* add TLDR and hw support

* enrich features section

* update model weights

* minor on README commands

* minor on features

* Update README.md

---------

Co-authored-by: Holden <[email protected]>
cebtenzzre added a commit that referenced this issue Jan 17, 2024
cebtenzzre added a commit that referenced this issue Jan 24, 2024
@slaren slaren mentioned this issue Aug 15, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Compilation issues
Projects
None yet
Development

No branches or pull requests

3 participants