-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize convert scripts #3838
Conversation
Good initiative. I do however think we should remove the torch dependency, like the main |
A single convert.py script that can do all models would be awesome! |
I don't think we can, since we depend on |
here: #3633 (comment) If we merge this PR then I won't have to worry about inconsistencies like this cropping up (still unfixed) #3680 (comment) |
That's the one! Thanks
Pretty much my motivation |
Do we have some tool to view Notes to selfBaichuan conversion is broken
Failed start
Successful start
It's like tensors aren't there, but the model size suggests they are.
gives the same results for both (compared via diff). |
The closest thing I know is https://github.com/huggingface/candle, which has a from_gguf function and Python bindings. But I don't know if it can even load Baichuan models. |
Based on the hexdump it looks like gguf_writer.ti_data_count is zero when tensors are written. That number should be the number of tensors. |
fcae724
to
0afa75a
Compare
Not sure if Persimmon (the old script) works, it won't work with this model at the very least: https://huggingface.co/adept/persimmon-8b-base I'll have to implement that one from scratch. |
|
I haven't converted as many models as theBloke, but over the last few weeks I've been in the process of converting models with "real" open source licenses on my huggingface account maddes8cht, some of which are somewhat neglected by theBloke. At the moment these are mainly Falcon and Mpt models (Apache2 licenses) (while mistral is also getting a lot of attention from theBloke at the moment. There is no need to duplicate work) My experience with the new convert script on a falcon 7b model (https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-7b): Ask me if you would like me to test something specific for you. |
@TheBloke sorry for that, long story short, the model converted ended up as big endian, not little endian. |
Yup
Yes, I'd like this script to support Llama models in huggingface format in the near future.
It should be pretty simple change, I'll add it soon EDIT: |
Confirming the checksum. I take the license issue very seriously and concentrate on models with " true" open source licenses. Apart from Falcon, mistral and mpt, there's not much left. Even bloom has quirky restrictions in its licenses that are not compatible with the open source idea. |
TBD in convert script and broken in llama.cpp (see #3837 (comment)). The current scripts only support the |
On my end we are good to merge this, if there are no more comments. I verified that old and new checksums match for all the models. |
Looks fine for me. |
Requested @ggerganov , so he is aware of this pr |
As mentioned in #3293, the
So somehow i consider it a bug that the convert script even builds these models, as noone ever could be able to test if it does something correct, until someone reimplements the code in https://github.com/ggerganov/ggml/tree/master/examples/gpt-neox to work with current Llama versions using gguf file format (because said "Examples" impelmentation only runs with ggml files produced by that examples own convert script, not the gguf files produced by this |
* Replace convert-*-hf-to-gguf.py files with convert-hf-to-gguf.py
A lot of code is duplicate between multiple convert scripts like
gpt-neox
,mpt
,bloom
,baichuan
.The usual flow of convert scripts is as follows;
With open model/get tokenizer/convert tensors/write tensors either similar or identical between different scripts. I refactored the code so that
Model
class deals with steps 2-5 and allows for simple implementation of new models via inheritance, by overwriting methods such asset_gguf_parameters
.This is mostly a draft PR to gauge whether it's worth to invest time in this. Feel free to close it if it's not desired.Supported models:
convert-bloom-hf-to-gguf.py
convert-baichuan-hf-to-gguf.py
convert-falcon-hf-to-gguf.py
convert-gptneox-hf-to-gguf.py
convert-mpt-hf-to-gguf.py
convert-persimmon-to-gguf.py
convert-refact-hf-to-gguf.py
convert-starcoder-hf-to-gguf.py
TODO:
Checksum verified (again):
convert-bloom-hf-to-gguf.py
convert-baichuan-hf-to-gguf.py
convert-falcon-hf-to-gguf.py
(needs change to tokenizer)convert-gptneox-hf-to-gguf.py
convert-mpt-hf-to-gguf.py
convert-persimmon-to-gguf.py
convert-refact-hf-to-gguf.py
convert-starcoder-hf-to-gguf.py
Note:
When checking whether old script and the new one convert the file correctly (using checksum), make sure the same input file is used (
model.safetensors
,pytorch_model.bin
). Generic script uses.safetensors
when both files are present, but other conversion scripts preferpytorch_model.bin