You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the hugging face config, I set quant_mode = TRUE.
The weight_integer buffer remains 0, and the result is wrong.
Moreover, inference latency of integer mode is 20 times of float mode.
Can you please explain the reason for me?
The text was updated successfully, but these errors were encountered:
Similar to this, I also found it is MUCH slower in quant_mode = True. here's a notebook with a slightly modified version of the HF code to allow dynamically switching quant_mode. You can see the timing difference.
In the hugging face config, I set quant_mode = TRUE.
The weight_integer buffer remains 0, and the result is wrong.
Moreover, inference latency of integer mode is 20 times of float mode.
Can you please explain the reason for me?
The text was updated successfully, but these errors were encountered: