Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider uploading some quantized checkpoints to hugginface #35

Open
Calandiel opened this issue Apr 21, 2023 · 2 comments
Open

Consider uploading some quantized checkpoints to hugginface #35

Calandiel opened this issue Apr 21, 2023 · 2 comments

Comments

@Calandiel
Copy link

Correct me if I'm wrong but quantizing would require loading the models in their unquantized form (as per torch.load in https://github.com/saharNooby/rwkv.cpp/blob/master/rwkv/convert_pytorch_to_ggml.py, line 126). Not to mention how much heavier the unquantized models are on bandwidths.

@saharNooby
Copy link
Collaborator

Only PyTorch -> rwkv.cpp conversion would require to load the whole model in the RAM; quantization is done tensor-by-tensor. You are right about the bandwidth tho.

I'll consider it, thanks for the suggestion!

@LoganDark
Copy link
Contributor

I have uploaded some quantized RWKV-4-Raven models to HuggingFace at LoganDark/rwkv-4-raven-ggml. Conversion took about 2 hours, and upload took about 24 hours and 500GB of disk space.

At the time of writing, the available models are:

Name f32 f16 Q4_0 Q4_1 Q4_2 Q5_1 Q8_0
RWKV-4-Raven-1B5-v11-Eng99-20230425-ctx4096 Yes Yes Yes No Yes Yes Yes
RWKV-4-Raven-3B-v11-Eng99-20230425-ctx4096 Yes Yes Yes No Yes Yes Yes
RWKV-4-Raven-7B-v11x-Eng99-20230429-ctx8192 Yes Yes Yes No Yes Yes Yes
RWKV-4-Raven-14B-v11x-Eng99-20230501-ctx8192 Split Yes Yes No Yes Yes Yes

Feel free to create a discussion if you have a request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants