-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation Fault Error "not enough space in the context's memory pool" #52
Comments
Are you running out of memory? |
I experience this as well, and I always have 5-6gb of RAM free when it occurs and around 20gb of swap. |
potentially fixed by #213 |
Latest commit b6b268d gives segmentation fault right away without even dropping into the input prompt. Was run on a Mac M1 Max with 64GB RAM. The crash happened on 30B LLaMA model but not on 7B. It was working fine even with the 65B model before the later commit.
|
Last known good commit I have just tested was indeed |
What error do you get with 483bab2 ? |
Same, |
just a few batches in. (30B q4_1) (just fed it a large file -f) |
Yippy! Commit 2a2e63c did fix the issue beautifully! Thank you!! |
Hi I am facing the issue of our of memory for context while using gpt4all model 1.3 groovy with a 32 cpu 512 gb ram model using cpu inference. |
Bumping on @eshaanagarwal ! Facing the same issue |
I am getting the same error: ggml_new_tensor_impl: not enough space in the context's memory pool (needed 20976224, available 12582912). I see that this has been a problem since March 12th. I am using llama-2-13b-chat.ggmlv3.q3_K_S.bin from TheBloke in Google Cloud Run with 32GB RAM and 8 vCPUs. The service is using LLaMA CPP Python. I'm quite new to LLaMA-cpp, so excuse any mistakes. This is the relevant part of my script: app: FastAPI = FastAPI(title=APP_NAME, version=APP_VERSION)
llama: Llama = Llama(model_path=MODEL_PATH, n_ctx=4096, n_batch=2048, n_threads=cpu_count())
response_model: Type[BaseModel] = model_from_typed_dict(ChatCompletion)
# APP FUNCTIONS
@[app.post](http://app.post/)("/api/chat", response_model=response_model)
async def chat(request: ChatCompletionsRequest) -> Union[ChatCompletion, EventSourceResponse]:
print("Chat-completion request received!")
completion_or_chunks: Union[ChatCompletion, Iterator[ChatCompletionChunk]] = llama.create_chat_completion(**request.dict(), max_tokens=4096)
completion: ChatCompletion = completion_or_chunks
print("Sending completion!")
return completion |
I am getting this with Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin, only when i send in embeddings from a vector db search results. inferences without retriever works fine with out this issue. I will try the regular llama 2 and see what happens. This is where it happens in the langchain.chains import ConversationalRetrievalChain |
It would really help to diagnose this if you are able to reproduce it with one of the examples in this repository. If that's not possible, I would suggest looking into what parameters are being passed to |
binary_path: F:\ProgramData\Anaconda3\envs\scrapalot-research-assistant\lib\site-packages\bitsandbytes\cuda_setup\libbitsandbytes_cuda116.dll |
Is there an example, where i can send in a context and the prompt? |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
This prompt with the 65B model on an M1 Max 64GB results in a segmentation fault. Works with 30B model. Are there problems with longer prompts? Related to #12
Results in
The text was updated successfully, but these errors were encountered: