-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model doesn't load on >2 GPU anymore. Says ggml_new_object: not enough space in the context's memory pool #4114
Comments
Does increasing |
Just tested it and no (I set 2048). But it does halve the values it complains about. I have over 256gb of cpu ram so it asking for 80gb after saying I had 163gb free and failing is interesting. |
You have to increase it, not decrease. Try with 8192 at least. |
Ok, I will do that. |
Ok, that worked, the model loads again. Any reason to set this number higher? |
It sets the maximum number of tensors in the computation graphs. Generally we want to keep it as low as possible to avoid wasting memory, but it seems that the larger models require a higher value. |
And that's only the CPU memory? I don't think I noticed any difference in terms of vram. |
Yes, it only increases CPU RAM usage, not VRAM. |
I could reproduce the error trying to launch the Goliath 120B model:
@slaren Thanks for the fix! I can confirm, that the issue is now fixed with https://github.com/ggerganov/llama.cpp/releases/tag/b1535 |
Expected Behavior
Model loaded to 2x3090 + 1 or 2 P40 loads and functions:
Current Behavior
Model fails with an error:
Failure Information (for bugs)
I'm mainly using the python bindings but v 2.17 works and v 2.18 doesn't. Same settings. I try to load 180b or 120b and this is what I get. I have more than enough vram but for some reason it dies on CPU ram despite the model already being loaded.
I tried numa and mlock to no avail. This is using MMQ kernels so nothing there should have changed.
Last commit it was working on was : df9d129
I tried reverting 1cf2850 manually but that wasn't it.
Will also try with today's commits and update to see what happens. Eliminated any code in the python wrapper by using llama.cpp rev that worked on the 2.18 version.
The text was updated successfully, but these errors were encountered: