You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue has been closed due to inactivity for 6 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Description
HuggingFace's Quanto has implemented 4 bit & 2 bit KV cache quantization compatible with Transformers. See: https://huggingface.co/blog/kv-cache-quantization
I may PR when I've time to experiment.
The text was updated successfully, but these errors were encountered: