Segmentation Fault Error "not enough space in the context's memory pool" #52

cannin · 2023-03-12T16:05:03Z

This prompt with the 65B model on an M1 Max 64GB results in a segmentation fault. Works with 30B model. Are there problems with longer prompts? Related to #12

./main --model ./models/65B/ggml-model-q4_0.bin --prompt "You are a question answering bot that is able to answer questions about the world. You are extremely smart, knowledgeable, capable, and helpful. You always give complete, accurate, and very detailed responses to questions, and never stop a response in mid-sentence or mid-thought. You answer questions in the following format:

Question: What’s the history of bullfighting in Spain?

Answer: Bullfighting, also known as "tauromachia," has a long and storied history in Spain, with roots that can be traced back to ancient civilizations. The sport is believed to have originated in 7th-century BCE Iberian Peninsula as a form of animal worship, and it evolved over time to become a sport and form of entertainment. Bullfighting as it is known today became popular in Spain in the 17th and 18th centuries. During this time, the sport was heavily influenced by the traditions of medieval jousts and was performed by nobles and other members of the upper classes. Over time, bullfighting became more democratized and was performed by people from all walks of life. Bullfighting reached the height of its popularity in the 19th and early 20th centuries and was considered a national symbol of Spain. However, in recent decades, bullfighting has faced increasing opposition from animal rights activists, and its popularity has declined. Some regions of Spain have banned bullfighting, while others continue to hold bullfights as a cherished tradition. Despite its declining popularity, bullfighting remains an important part of Spanish culture and history, and it continues to be performed in many parts of the country to this day.

Now complete the following questions:

Question: What happened to the field of cybernetics in the 1970s?

Answer: "

Results in

...
sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


You are a question answering bot that is able to answer questions about the world. You are extremely smart, knowledgeable, capable, and helpful. You always give complete, accurate, and very detailed responses to questions, and never stop a response in mid-sentence or mid-thought. You answer questions in the following format:

Question: What’s the history of bullfighting in Spain?

Answer: Bullfighting, also known as tauromachia, has a long and storied history in Spain, with roots that can be traced back to ancient civilizations. The sport is believed to have originated in 7th-century BCE Iberian Peninsula as a form of animal worship, and it evolved over time to become a sport and form of entertainment. Bullfighting as it is known today became popular in Spain in the 17th and 18th centuries. During this time, the sport was heavily influenced by the traditions of medieval jousts and was performed by nobles and other members of the upper classes. Over time, bullfighting became more democratized and was performed by people from all walks of life. Bullfighting reached the height of its popularity in the 19th and early 20th centuries and was considered a national symbol of Spain. However, in recent decades, bullfighting has faced increasing opposition from animal rights activists, and its popularity has declined. Some regions of Spain have banned bullfighting, while others continue to hold bullfights as a cherished tradition. Despite its declining popularity, bullfighting remainsggml_new_tensor_impl: not enough space in the context's memory pool (needed 701660720, available 700585498)
zsh: segmentation fault  ./main --model ./models/65B/ggml-model-q4_0.bin --prompt

The text was updated successfully, but these errors were encountered:

gjmulder · 2023-03-15T22:06:08Z

Are you running out of memory?

Gobz · 2023-03-16T21:36:46Z

I experience this as well, and I always have 5-6gb of RAM free when it occurs and around 20gb of swap.
It appears to be a known problem with memory allocation based on ggernanov's comments in #71

Green-Sky · 2023-03-16T22:46:40Z

potentially fixed by #213

edwios · 2023-03-24T14:13:09Z

Latest commit b6b268d gives segmentation fault right away without even dropping into the input prompt. Was run on a Mac M1 Max with 64GB RAM. The crash happened on 30B LLaMA model but not on 7B. It was working fine even with the 65B model before the later commit.

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.

 
Text transcript of a never ending dialog, where User interacts with an AI assistant named ChatLLaMa.
ChatLLaMa is helpful, kind, honest, friendly, good at writing and never fails to answer User’s requests immediately and with details and precision.
There are no annotations like (30 seconds passed...) or (to himself), just what User and ChatLLaMa say aloud to each other.
The dialog lasts for years, the entirety of it is shared below. It's 1000ggml_new_tensor_impl: not enough space in the context's memory pool (needed 536987232, available 536870912)
./chatLLaMa: line 53: 99012 Segmentation fault: 11  ./main $GEN_OPTIONS --model "$MODEL" --threads "$N_THREAD" --n_predict "$N_PREDICTS" --color --interactive --reverse-prompt "${USER_NAME}:" --prompt "

Green-Sky · 2023-03-24T14:29:31Z

@edwios try 404e1da (the one before 483bab2) or try my pr #438 (closed since gg is going to do it differently, but still should work until then)

edwios · 2023-03-24T15:30:25Z

Last known good commit I have just tested was indeed
404e1da

ggerganov · 2023-03-24T15:37:05Z

What error do you get with 483bab2 ?

edwios · 2023-03-24T16:09:36Z

Same, ./chatLLaMa: line 53: 99012 Segmentation fault: 11 ./main $GEN_OPTIONS --model "$MODEL" --threads "$N_THREAD" --n_predict "$N_PREDICTS" --color --interactive --reverse-prompt "${USER_NAME}:" --prompt "

main-2023-03-24-155839.ips.zip

Green-Sky · 2023-03-24T16:18:35Z

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 536987232, available 536870912)
Segmentation fault (core dumped)

just a few batches in. (30B q4_1) (just fed it a large file -f)

edwios · 2023-03-25T14:43:30Z

Yippy! Commit 2a2e63c did fix the issue beautifully! Thank you!!

eshaanagarwal · 2023-06-13T19:14:48Z

Hi I am facing the issue of our of memory for context while using gpt4all model 1.3 groovy with a 32 cpu 512 gb ram model using cpu inference.

dukeeagle · 2023-07-09T20:00:50Z

Bumping on @eshaanagarwal ! Facing the same issue

sw · 2023-07-10T11:10:07Z

Did it work for you with commit 2a2e63c and can you narrow down the commit that broke it?

In #1237, I changed some size_t parameters to int, I'm now worrying that may be the culprit. This was done because the dequantize functions already used int for the number of elements.

Uralstech · 2023-08-10T19:57:45Z

I am getting the same error: ggml_new_tensor_impl: not enough space in the context's memory pool (needed 20976224, available 12582912). I see that this has been a problem since March 12th.

I am using llama-2-13b-chat.ggmlv3.q3_K_S.bin from TheBloke in Google Cloud Run with 32GB RAM and 8 vCPUs. The service is using LLaMA CPP Python.

I'm quite new to LLaMA-cpp, so excuse any mistakes. This is the relevant part of my script:

app: FastAPI = FastAPI(title=APP_NAME, version=APP_VERSION)
llama: Llama = Llama(model_path=MODEL_PATH, n_ctx=4096, n_batch=2048, n_threads=cpu_count())

response_model: Type[BaseModel] = model_from_typed_dict(ChatCompletion)

# APP FUNCTIONS

@[app.post](http://app.post/)("/api/chat", response_model=response_model)
async def chat(request: ChatCompletionsRequest) -> Union[ChatCompletion, EventSourceResponse]:
    print("Chat-completion request received!")

    completion_or_chunks: Union[ChatCompletion, Iterator[ChatCompletionChunk]] = llama.create_chat_completion(**request.dict(), max_tokens=4096)
    completion: ChatCompletion = completion_or_chunks

    print("Sending completion!")
    return completion

anujcb · 2023-08-14T03:20:24Z

I am getting this with Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin, only when i send in embeddings from a vector db search results. inferences without retriever works fine with out this issue. I will try the regular llama 2 and see what happens. This is where it happens in the langchain.chains import ConversationalRetrievalChain

anujcb · 2023-08-14T06:29:29Z

OK, i was able to make it work by reducing the number of docs to 1, any value above 1 throws the memory access violation

slaren · 2023-08-14T15:07:17Z

It would really help to diagnose this if you are able to reproduce it with one of the examples in this repository. If that's not possible, I would suggest looking into what parameters are being passed to llama_eval. This could happen if n_tokens is higher than n_batch, or if n_tokens + n_past is higher than n_ctx.

anujcb · 2023-08-14T19:06:43Z

I think, the issue maybe because of the special characters in the context. This was the context send to generate from llm.

i debugged it and intercepted the call before this text was sent to the llm. Copy pasted this into textpad to clean out the special characters and it seemed to be working.

The context is produced from a vector DB containing chunks of Tesla's 10K filings for the last 4 years. Looks like when the chunking was done, the special characters got in to the vector db and LLM was not bale to process the special characters.

The prompt that went with the context was "what are the risk factors for Tesla?"

anujcb · 2023-08-15T01:20:59Z

binary_path: F:\ProgramData\Anaconda3\envs\scrapalot-research-assistant\lib\site-packages\bitsandbytes\cuda_setup\libbitsandbytes_cuda116.dll
CUDA SETUP: Loading binary F:\ProgramData\Anaconda3\envs\scrapalot-research-assistant\lib\site-packages\bitsandbytes\cuda_setup\libbitsandbytes_cuda116.dll...
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1070, compute capability 6.1
INFO: Started server process [24636]
INFO: Waiting for application startup.
llama.cpp: loading model from ./../llama.cpp/models/Vicuna/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 4096
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_head_kv = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: n_gqa = 1
llama_model_load_internal: rnorm_eps = 1.0e-06
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.08 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_load_internal: mem required = 596.40 MB (+ 2048.00 MB per state)
llama_model_load_internal: allocating batch_size x (512 kB + n_ctx x 128 B) = 512 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 32 repeating layers to GPU
llama_model_load_internal: offloading non-repeating layers to GPU
llama_model_load_internal: offloading v cache to GPU
llama_model_load_internal: offloading k cache to GPU
llama_model_load_internal: offloaded 35/35 layers to GPU
llama_model_load_internal: total VRAM used: 6106 MB
llama_new_context_with_model: kv self size = 2048.00 MB
AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |

anujcb · 2023-08-15T01:25:34Z

I think, the issue maybe because of the special characters in the context. This was the context send to generate from llm. i debugged it and intercepted the call before this text was sent to the llm. Copy pasted this into textpad to clean out the special characters and it seemed to be working.

The context is produced from a vector DB containing chunks of Tesla's 10K filings for the last 4 years. Looks like when the chunking was done, the special characters got in to the vector db and LLM was not bale to process the special characters.

The prompt that went with the context was "what are the risk factors for Tesla?"

UPDATE: After extensive testing, i am at a conclusion that this is not caused by the special characters, this is caused by the amount of text that is being sent as context. I can comfortably send about 2000 characters(2K bytes) without this memory issue, sometimes even more(I think this depends on how much memory i have free...maybe).

I am using CUDA with an old GPU and older processor(AVX2=0), 32 GB of memory.

anujcb · 2023-08-15T04:54:51Z

It would really help to diagnose this if you are able to reproduce it with one of the examples in this repository. If that's not possible, I would suggest looking into what parameters are being passed to llama_eval. This could happen if n_tokens is higher than n_batch, or if n_tokens + n_past is higher than n_ctx.

Is there an example, where i can send in a context and the prompt?

github-actions · 2024-04-09T01:10:22Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

gjmulder added the need more info The OP should provide more details about the issue label Mar 15, 2023

Green-Sky added the bug Something isn't working label Mar 22, 2023

Green-Sky mentioned this issue Mar 22, 2023

[mqy] ./examples/chatLLaMa: line 53: 33476 Segmentation fault: 11 #373

Closed

ghost mentioned this issue Mar 29, 2023

Support tensors with 64-bit number of elements in ggml #599

Closed

sw closed this as completed Apr 16, 2023

RonquilloAeon mentioned this issue May 16, 2023

Warnnings (gpt_tokenize: unknown token 'Ö') & (ggml_new_tensor_impl: not enough space in the context's memory pool (needed 7530291008, available 7525403600)) zylon-ai/private-gpt#170

Closed

cosmic-snow mentioned this issue Jun 13, 2023

Issues ggml_new_tensor_impl: not enough space in the context's memory pool (needed 12641168288, available 12633509264) nomic-ai/gpt4all#971

Closed

10 tasks

sw reopened this Jul 10, 2023

cantersaortas mentioned this issue Aug 8, 2023

Add Dockerfile for CUDA PromtEngineer/localGPT#279

Merged

pseudotensor mentioned this issue Aug 28, 2023

Fatal Python error: Segmentation fault when inferencing with embeddings db of 4GB h2oai/h2ogpt#683

Open

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation Fault Error "not enough space in the context's memory pool" #52

Segmentation Fault Error "not enough space in the context's memory pool" #52

cannin commented Mar 12, 2023

gjmulder commented Mar 15, 2023

Gobz commented Mar 16, 2023

Green-Sky commented Mar 16, 2023

edwios commented Mar 24, 2023

Green-Sky commented Mar 24, 2023

edwios commented Mar 24, 2023 •

edited

Loading

ggerganov commented Mar 24, 2023

edwios commented Mar 24, 2023

Green-Sky commented Mar 24, 2023

edwios commented Mar 25, 2023

eshaanagarwal commented Jun 13, 2023

dukeeagle commented Jul 9, 2023

sw commented Jul 10, 2023

Uralstech commented Aug 10, 2023

anujcb commented Aug 14, 2023 •

edited

Loading

anujcb commented Aug 14, 2023

slaren commented Aug 14, 2023

anujcb commented Aug 14, 2023

anujcb commented Aug 15, 2023 •

edited

Loading

anujcb commented Aug 15, 2023

anujcb commented Aug 15, 2023

github-actions bot commented Apr 9, 2024

Segmentation Fault Error "not enough space in the context's memory pool" #52

Segmentation Fault Error "not enough space in the context's memory pool" #52

Comments

cannin commented Mar 12, 2023

gjmulder commented Mar 15, 2023

Gobz commented Mar 16, 2023

Green-Sky commented Mar 16, 2023

edwios commented Mar 24, 2023

Green-Sky commented Mar 24, 2023

edwios commented Mar 24, 2023 • edited Loading

ggerganov commented Mar 24, 2023

edwios commented Mar 24, 2023

Green-Sky commented Mar 24, 2023

edwios commented Mar 25, 2023

eshaanagarwal commented Jun 13, 2023

dukeeagle commented Jul 9, 2023

sw commented Jul 10, 2023

Uralstech commented Aug 10, 2023

anujcb commented Aug 14, 2023 • edited Loading

anujcb commented Aug 14, 2023

slaren commented Aug 14, 2023

anujcb commented Aug 14, 2023

anujcb commented Aug 15, 2023 • edited Loading

anujcb commented Aug 15, 2023

anujcb commented Aug 15, 2023

github-actions bot commented Apr 9, 2024

edwios commented Mar 24, 2023 •

edited

Loading

anujcb commented Aug 14, 2023 •

edited

Loading

anujcb commented Aug 15, 2023 •

edited

Loading