-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Support for the new GGUF format which replaces GGML #3676
Comments
abetlen/llama-cpp-python#628 |
You can probably use: abetlen/llama-cpp-python#633 if you merge it yourselves I would back up old llama-cpp-python. llama.cpp does not care about breaking changes. |
it looks like this is now done |
Still can't download them. Hope it will be supported soon :) |
It looks like we're now waiting on https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels to be updated, I've added a request issue - jllllll/llama-cpp-python-cuBLAS-wheels#3 |
Wheels have been uploaded:
Keep in mind that this version of llama-cpp-python does not support GGML models. Only GGUF models. ctransformers has already been updated in the webui to support GGUF, if all you want is to try it out. Personally, I would prefer to wait until more GGML models are converted to GGUF before updating llama-cpp-python. If ooba wants, I can implement the previous version of llama-cpp-python as separate packages in order to maintain GGML support, but that is a pretty messy solution even if it is temporary. Alternatively, ctransformers can be used for GGML support as it supports both formats. This isn't a great solution either, as ctransformers is noticeably slower than llama-cpp-python, for whatever reason. |
isn't there a conversion script? if so why wait? |
The conversion script is not guaranteed to work and it's usage can be somewhat involved in order to perform a proper conversion. As I mentioned, there aren't many GGUF models available right now and the ctransformers loader already supports them. |
yeah, I just deleted all my GGML files because GGUF came out. Guess I'll stick with ctransformers for now, but I think people want the ease of not having to set any parameters that comes with GGUF |
Yes, I do think this would be the ideal approach. Otherwise, many people are going to complain why it suddenly stops working. If a GGML file is detected, it just uses the older commit while GGUF would use the new one. |
what about just have the script convert them for people so no duplicate llama-cpp-python is needed? maybe a check and a "do you want to convert this to gguf?" ui element |
I have written the code needed to support this here: jllllll@4a999e3 |
The conversion script is not guaranteed to work with every model. Edit: The issue with converting |
Hope it works for those 70B I d/l over the last week. I think this is the 3rd or 4th time they deprecate a format and it's always all or nothing. |
Fortunately, GGUF is designed to be expandable. So, this should be the last format deprecation. |
I was just starting to experiment with installing local LLMs (I've wanted to experiment with them for ages but been too busy), but seems I've picked a tumultuous time to start, so eagerly waiting for this migration to be complete so I can download and start playing with the models that won't be obsolete in a week! Edit: It seems to now be supported! I successfully am running TheBloke's GGUF CodeLlama release! |
Works, but currently GGUF speak like Yoda: |
What is this I hear about GGUFv2 and header compatibility removed by October? |
@Ph0rk0z yes, GGUFv2 was released, which is BC :) They decided it was better, because GGUFv1 is not so much yet. |
Right.. but they deprecate the format again? Why not deprecate the format now or keep it? Waiting till October so people upload already obsolete models? |
The GGUFv2 implementation is still compatible with v1. I don't think it will cause any issues. |
@Ph0rk0z well. someone says, that backward compatibility seems to work fine ggerganov/llama.cpp#2821 |
Right.. but I read this in the code
also |
Why on earth would they do that? If they keep doing that, people just won't bother with GGUF. They could get away with frequent format deprecation in the past because there wasn't all that many llama-based models back then. Now, there are too many models using GGMLv3 to want to have to deal with constantly redownloading and reconverting models. |
Yea.. so people will upload GGUFv1 models next 2 months for what? It has only been a couple of days. Why not convert them now when there are few. But instead, no mention of this and we have to go looking for it. I have low bandwidth so can't just re download every 70b. Had this situation before, when GPU offloading was first created. All the GGML I downloaded had to be requantized from scratch or re-downloaded. You couldn't even use a script to convert them. And I can't not bother with GGUF because some good PRs got merged after. |
They are saying in this discussion that its a simple command line conversion between GGUF v1 to GGUF v2. The deprecation in this GGUF format is simply for simplicity rather than having a technical reason to stop supporting GGUF v1. |
Right! Very simple! Little chance of failure.
'spose I'll see how it goes for the current quants and turning them into GGUF.. but will they be GGUFv1 or v2 when I use the script now? |
After days of troubleshooting and literally having brain freeze over the "can't load the model" error, I came across this thread now!!(RIP me) I burned about 30 gb downloading all kinds of models and none of them are gguf(talk about being unlucky). Will atleast the gguf version stay for a while now? I don't want to end up with unsupported versions and then again break my head over the same issue in a week time. Also what are versions does langchain work with? Does the gptq version work or is it only gguf? |
Convert the GGML to GGUF if you got them recently it will work. |
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment. |
Llama.cpp has dropped support for the GGML format and now only supports GGUF
The text was updated successfully, but these errors were encountered: