Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Support newer non linear quantization #5674

Closed
qnixsynapse opened this issue Feb 23, 2024 · 20 comments
Closed

[SYCL] Support newer non linear quantization #5674

qnixsynapse opened this issue Feb 23, 2024 · 20 comments
Labels
bug-unconfirmed help wanted Extra attention is needed

Comments

@qnixsynapse
Copy link
Contributor

qnixsynapse commented Feb 23, 2024

cc: @abhilash1910 @airMeng

Also important!

@airMeng
Copy link
Collaborator

airMeng commented Feb 23, 2024

we haven't supported iq3_xxs, but you can try to implement yourself based on CUDA code. Feel free to ping us here if you encounter any issues

llama.cpp/ggml-sycl.cpp

Lines 14970 to 14972 in 15499eb

if (a->type == GGML_TYPE_IQ3_XXS) {
return false;
}

@qnixsynapse
Copy link
Contributor Author

I see. Thanks. Should I close this then?

@airMeng
Copy link
Collaborator

airMeng commented Feb 23, 2024

Also important!

Thank you for informing!

I see. Thanks. Should I close this then?

up to you

@qnixsynapse qnixsynapse reopened this Mar 1, 2024
@qnixsynapse qnixsynapse changed the title [SYCL] IQ3_XXS quantized models seems to be getting a ggml_assert: ggml-sycl.cpp:12141: to_fp16_sycl != nullptr [SYCL] Support newer non linear quantization Mar 1, 2024
@qnixsynapse
Copy link
Contributor Author

I am reopening this. I may/may not work on this since I am not a C person. I wish I could set the label to "help-wanted".

@ggerganov ggerganov added the help wanted Extra attention is needed label Mar 1, 2024
@qnixsynapse
Copy link
Contributor Author

@ggerganov Thank you!

@airMeng
Copy link
Collaborator

airMeng commented Mar 4, 2024

#5862

Can you try this PR and report the results?

@qnixsynapse
Copy link
Contributor Author

qnixsynapse commented Mar 4, 2024

Got a ggml assert while trying to test an IQ3_XXS quantized gguf:

...
...
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
...
...
GGML_ASSERT: llama.cpp/ggml-sycl.cpp:14097: to_fp16_sycl != nullptr

The other non IQ quants looking good although I didn't understand the warning that I am getting about ext_intel_free_memory.

@airMeng
Copy link
Collaborator

airMeng commented Mar 4, 2024

what model are you using? can you share a link?

@qnixsynapse
Copy link
Contributor Author

qnixsynapse commented Mar 4, 2024

This time I used Eris 7B to test IQ3.

And for linear quants I used Mistral 7B.

@NeoZhangJianyu
Copy link
Collaborator

Got a ggml assert while trying to test an IQ3_XXS quantized gguf:

...
...
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
...
...
GGML_ASSERT: llama.cpp/ggml-sycl.cpp:14097: to_fp16_sycl != nullptr

The other non IQ quants looking good although I didn't understand the warning that I am getting about ext_intel_free_memory.

export/set ZES_ENABLE_SYSMAN=1 to support ext_intel_free_memory() to get free memory.
It's not important if your GPU is not shared.

@abhilash1910
Copy link
Collaborator

This time I used Eris 7B to test IQ3.

And for linear quants I used Mistral 7B.

@akarshanbiswas Could you try 3.75 bit /1.5 bit (#5886 )and let us know for Erin model.

@qnixsynapse
Copy link
Contributor Author

qnixsynapse commented Mar 8, 2024

@abhilash1910 Failed with main: ggml-sycl.cpp:2968: dpct::detail::device_memory<const unsigned int, dpct::global, 1>::device_memory(const sycl::range<Dimension> &, std::initializer_list<value_t> &&) [T = const unsigned int, Memory = dpct::global, Dimension = 1]: Assertion init_list.size() <= in_range.size()' failed.`

Edit: Looks like everything is broken.

Edit 2: Confirmed that PR is the culprit!

@NeoZhangJianyu
Copy link
Collaborator

mistral-7b-instruct-v0.2.Q4_K_M.gguf is passed by other test.
Please try with this model.

@qnixsynapse
Copy link
Contributor Author

@NeoZhangJianyu
Copy link
Collaborator

I have asked the developer to fix it.

@qnixsynapse
Copy link
Contributor Author

Thank you.

@airMeng
Copy link
Collaborator

airMeng commented Apr 7, 2024

shall be fixed in #6521

@qnixsynapse
Copy link
Contributor Author

qnixsynapse commented Apr 7, 2024

Awesome!! Really thank you everyone for all the hard work. I will try testing a model today with this patch (hopefully)!.

@qnixsynapse
Copy link
Contributor Author

Update: Just tested a iq4_XXS gemma 7B(8.5B) 1.1 model which was released recently and it is working great!!!

Closing this issue as solved for now!!

@abhilash1910
Copy link
Collaborator

Thanks for confirming @akarshanbiswas .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants