You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ggml has support for Q1_O quantization now which was reported to offer better inference quality for some of the models at a cost of slower execution. At the same time, Open Assistant released newer weights for the pythia based model than the ones that are currently being pulled.
Perhaps it'd be worth updating the model on hugginface using the new quantization method?
I would make a PR with it myself but I don't have access to a GPU with enough RAM to quantize the 12B model.
The text was updated successfully, but these errors were encountered:
ggml has support for Q1_O quantization now which was reported to offer better inference quality for some of the models at a cost of slower execution. At the same time, Open Assistant released newer weights for the pythia based model than the ones that are currently being pulled.
Perhaps it'd be worth updating the model on hugginface using the new quantization method?
I would make a PR with it myself but I don't have access to a GPU with enough RAM to quantize the 12B model.
The text was updated successfully, but these errors were encountered: