-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Support newer non linear quantization #5674
Comments
we haven't supported iq3_xxs, but you can try to implement yourself based on CUDA code. Feel free to ping us here if you encounter any issues Lines 14970 to 14972 in 15499eb
|
I see. Thanks. Should I close this then? |
Thank you for informing!
up to you |
I am reopening this. I may/may not work on this since I am not a C person. I wish I could set the label to "help-wanted". |
@ggerganov Thank you! |
Can you try this PR and report the results? |
Got a ggml assert while trying to test an IQ3_XXS quantized gguf:
The other non IQ quants looking good although I didn't understand the warning that I am getting about ext_intel_free_memory. |
what model are you using? can you share a link? |
This time I used Eris 7B to test IQ3. And for linear quants I used Mistral 7B. |
export/set ZES_ENABLE_SYSMAN=1 to support ext_intel_free_memory() to get free memory. |
@akarshanbiswas Could you try 3.75 bit /1.5 bit (#5886 )and let us know for Erin model. |
@abhilash1910 Failed with Edit: Looks like everything is broken. Edit 2: Confirmed that PR is the culprit! |
mistral-7b-instruct-v0.2.Q4_K_M.gguf is passed by other test. |
I have asked the developer to fix it. |
Thank you. |
shall be fixed in #6521 |
Awesome!! Really thank you everyone for all the hard work. I will try testing a model today with this patch (hopefully)!. |
Update: Just tested a iq4_XXS gemma 7B(8.5B) 1.1 model which was released recently and it is working great!!! Closing this issue as solved for now!! |
Thanks for confirming @akarshanbiswas . |
cc: @abhilash1910 @airMeng
Also important!
The text was updated successfully, but these errors were encountered: