Model doesn't load on >2 GPU anymore. Says ggml_new_object: not enough space in the context's memory pool #4114

Ph0rk0z · 2023-11-17T15:15:24Z

Expected Behavior

Model loaded to 2x3090 + 1 or 2 P40 loads and functions:

llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: offloading v cache to GPU
llama_kv_cache_init: offloading k cache to GPU
llama_kv_cache_init: VRAM kv self = 2192.00 MB
llama_new_context_with_model: kv self size  = 2192.00 MB
llama_build_graph: non-view tensors processed: 3155/3155
llama_new_context_with_model: compute buffer total size = 574.63 MB
llama_new_context_with_model: VRAM scratch buffer: 568.00 MB
llama_new_context_with_model: total VRAM used: 65972.68 MB (model: 63212.67 MB, context: 2760.00 MB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
2023-11-16 12:33:35 INFO:Loaded the model in 136.40 seconds.

Current Behavior

Model fails with an error:


ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 3090) as main device
llm_load_tensors: mem required  =  141.08 MB
llm_load_tensors: offloading 137 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 140/140 layers to GPU
llm_load_tensors: VRAM used: 63212.67 MB
....................................................................................................
llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: offloading v cache to GPU
llama_kv_cache_init: offloading k cache to GPU
llama_kv_cache_init: VRAM kv self = 2192.00 MB
llama_new_context_with_model: kv self size  = 2192.00 MB
ggml_new_object: not enough space in the context's memory pool (needed 1638880, available 1638544)
Segmentation fault (core dumped)

Failure Information (for bugs)

I'm mainly using the python bindings but v 2.17 works and v 2.18 doesn't. Same settings. I try to load 180b or 120b and this is what I get. I have more than enough vram but for some reason it dies on CPU ram despite the model already being loaded.

I tried numa and mlock to no avail. This is using MMQ kernels so nothing there should have changed.

Last commit it was working on was : df9d129

I tried reverting 1cf2850 manually but that wasn't it.

Will also try with today's commits and update to see what happens. Eliminated any code in the python wrapper by using llama.cpp rev that worked on the 2.18 version.

The text was updated successfully, but these errors were encountered:

slaren · 2023-11-17T15:39:51Z

Does increasing LLAMA_MAX_NODES in llama.cpp fix it?

Ph0rk0z · 2023-11-17T16:07:41Z

Just tested it and no (I set 2048). But it does halve the values it complains about. I have over 256gb of cpu ram so it asking for 80gb after saying I had 163gb free and failing is interesting.

slaren · 2023-11-17T16:13:34Z

You have to increase it, not decrease. Try with 8192 at least.

Ph0rk0z · 2023-11-17T16:14:02Z

Ok, I will do that.

Ph0rk0z · 2023-11-17T16:19:49Z

Ok, that worked, the model loads again. Any reason to set this number higher?

slaren · 2023-11-17T16:22:01Z

It sets the maximum number of tensors in the computation graphs. Generally we want to keep it as low as possible to avoid wasting memory, but it seems that the larger models require a higher value.

Ph0rk0z · 2023-11-17T16:30:50Z

And that's only the CPU memory? I don't think I noticed any difference in terms of vram.

slaren · 2023-11-17T16:35:13Z

Yes, it only increases CPU RAM usage, not VRAM.

countzero · 2023-11-17T20:24:25Z

I could reproduce the error trying to launch the Goliath 120B model:

ggml_new_object: not enough space in the context's memory pool (needed 1638880, available 1638544)

@slaren Thanks for the fix!
@ggerganov Thanks for the new release!

I can confirm, that the issue is now fixed with https://github.com/ggerganov/llama.cpp/releases/tag/b1535

Ph0rk0z added the bug-unconfirmed label Nov 17, 2023

slaren mentioned this issue Nov 17, 2023

llama : increase max nodes #4115

Merged

ggerganov closed this as completed in #4115 Nov 17, 2023

MStanton3 mentioned this issue Nov 20, 2023

Error after the latest update: "not enough space in the context's memory pool" (+ link to a potential fix) oobabooga/text-generation-webui#4681

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model doesn't load on >2 GPU anymore. Says ggml_new_object: not enough space in the context's memory pool #4114

Model doesn't load on >2 GPU anymore. Says ggml_new_object: not enough space in the context's memory pool #4114

Ph0rk0z commented Nov 17, 2023

slaren commented Nov 17, 2023

Ph0rk0z commented Nov 17, 2023 •

edited

Loading

slaren commented Nov 17, 2023

Ph0rk0z commented Nov 17, 2023

Ph0rk0z commented Nov 17, 2023

slaren commented Nov 17, 2023

Ph0rk0z commented Nov 17, 2023

slaren commented Nov 17, 2023

countzero commented Nov 17, 2023

Model doesn't load on >2 GPU anymore. Says ggml_new_object: not enough space in the context's memory pool #4114

Model doesn't load on >2 GPU anymore. Says ggml_new_object: not enough space in the context's memory pool #4114

Comments

Ph0rk0z commented Nov 17, 2023

Expected Behavior

Current Behavior

Failure Information (for bugs)

slaren commented Nov 17, 2023

Ph0rk0z commented Nov 17, 2023 • edited Loading

slaren commented Nov 17, 2023

Ph0rk0z commented Nov 17, 2023

Ph0rk0z commented Nov 17, 2023

slaren commented Nov 17, 2023

Ph0rk0z commented Nov 17, 2023

slaren commented Nov 17, 2023

countzero commented Nov 17, 2023

Ph0rk0z commented Nov 17, 2023 •

edited

Loading