-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FP8 version #224
Comments
Hi @themrzmaster, you mean 8-bit-AWQ right? which version are you interested in v2.5 or v3? |
v3! thanks |
@themrzmaster @khai-meetkai you can live-quantize with |
@khai-meetkai also one more mention when doing AWQ quants (you probably know it already but I wanted to mention it just in case): it's quite important that the calibration dataset aligns with the use case of function calling, so it's probably a good idea to calibrate not just on some default dataset but also mixed with your own dataset (with some fc samples). This makes AWQ quants (especially 4 bit) a bit more optimized and reliable. We tested this on some of the older medium functionary models and got better results by expanding the dataset we use for AWQ quantization to synthetically generated function calling data from your original model. |
Hi @localmind-ai, thank you for reminding us ! Yeah, the calibration dataset should also be function calling data. Currently, we don't have any plans for creating .AWQ as we have more urgent tasks. But we will definitely use function calling data as calibration data if we do so. |
Thanks for the information @khai-meetkai! Fully understandable. |
@localmind-ai We have just released meetkai/functionary-medium-v3.1-fp8 using small part of training data as calibration data. From our evaluation, this quantized model gave almost the same results as the original model |
Thanks for your work!
Would be nice to have FP8 versions avilable on HF, as vLLM has special
Kernels for it and flash attention 3 is moving on that directiong too.
Thanks
The text was updated successfully, but these errors were encountered: