Handle .to for 4bit quantized models #186

g8a9 · 2023-05-24T21:27:10Z

Description

This PR extends the logic that prevents calling .to on 8bit quantized HF models to models loaded with 4bit quantization.

gsarti

LGTM, thanks for the timely fix!

Handle .to for 4bit quantized models

d9210fc

gsarti approved these changes May 25, 2023

View reviewed changes

gsarti merged commit e3b1f59 into inseq-team:main May 25, 2023

gsarti added this to the v0.5 milestone Jul 21, 2023