[SYCL] Support newer non linear quantization #5674

qnixsynapse · 2024-02-23T05:40:16Z

cc: @abhilash1910 @airMeng

airMeng · 2024-02-23T05:45:00Z

we haven't supported iq3_xxs, but you can try to implement yourself based on CUDA code. Feel free to ping us here if you encounter any issues

llama.cpp/ggml-sycl.cpp

Lines 14970 to 14972 in 15499eb

    
                           if (a->type == GGML_TYPE_IQ3_XXS) { 
        
                             return false; 
        
                           }

qnixsynapse · 2024-02-23T05:46:30Z

I see. Thanks. Should I close this then?

airMeng · 2024-02-23T05:51:32Z

Also important!

Thank you for informing!

I see. Thanks. Should I close this then?

up to you

qnixsynapse · 2024-03-01T13:46:32Z

I am reopening this. I may/may not work on this since I am not a C person. I wish I could set the label to "help-wanted".

qnixsynapse · 2024-03-01T13:48:51Z

@ggerganov Thank you!

airMeng · 2024-03-04T00:58:05Z

#5862

Can you try this PR and report the results?

qnixsynapse · 2024-03-04T03:56:58Z

Got a ggml assert while trying to test an IQ3_XXS quantized gguf:

...
...
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
...
...
GGML_ASSERT: llama.cpp/ggml-sycl.cpp:14097: to_fp16_sycl != nullptr

The other non IQ quants looking good although I didn't understand the warning that I am getting about ext_intel_free_memory.

airMeng · 2024-03-04T08:52:00Z

what model are you using? can you share a link?

qnixsynapse · 2024-03-04T11:48:42Z

This time I used Eris 7B to test IQ3.

And for linear quants I used Mistral 7B.

NeoZhangJianyu · 2024-03-06T00:56:41Z

Got a ggml assert while trying to test an IQ3_XXS quantized gguf:
...
...
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
...
...
GGML_ASSERT: llama.cpp/ggml-sycl.cpp:14097: to_fp16_sycl != nullptr
The other non IQ quants looking good although I didn't understand the warning that I am getting about ext_intel_free_memory.

export/set ZES_ENABLE_SYSMAN=1 to support ext_intel_free_memory() to get free memory.
It's not important if your GPU is not shared.

abhilash1910 · 2024-03-08T11:47:31Z

This time I used Eris 7B to test IQ3.

And for linear quants I used Mistral 7B.

@akarshanbiswas Could you try 3.75 bit /1.5 bit (#5886 )and let us know for Erin model.

qnixsynapse · 2024-03-08T12:46:41Z

@abhilash1910 Failed with main: ggml-sycl.cpp:2968: dpct::detail::device_memory<const unsigned int, dpct::global, 1>::device_memory(const sycl::range<Dimension> &, std::initializer_list<value_t> &&) [T = const unsigned int, Memory = dpct::global, Dimension = 1]: Assertion init_list.size() <= in_range.size()' failed.`

Edit: Looks like everything is broken.

Edit 2: Confirmed that PR is the culprit!

NeoZhangJianyu · 2024-03-15T11:31:57Z

mistral-7b-instruct-v0.2.Q4_K_M.gguf is passed by other test.
Please try with this model.

qnixsynapse · 2024-03-15T14:59:26Z

The regression was fixed after grid space increment.

NeoZhangJianyu · 2024-03-16T01:49:59Z

I have asked the developer to fix it.

qnixsynapse · 2024-03-16T11:34:02Z

Thank you.

airMeng · 2024-04-07T02:19:45Z

shall be fixed in #6521

qnixsynapse · 2024-04-07T03:00:44Z

Awesome!! Really thank you everyone for all the hard work. I will try testing a model today with this patch (hopefully)!.

qnixsynapse · 2024-04-09T05:38:28Z

Update: Just tested a iq4_XXS gemma 7B(8.5B) 1.1 model which was released recently and it is working great!!!

Closing this issue as solved for now!!

abhilash1910 · 2024-04-09T05:40:27Z

Thanks for confirming @akarshanbiswas .

qnixsynapse added the bug-unconfirmed label Feb 23, 2024

qnixsynapse closed this as completed Feb 23, 2024

qnixsynapse reopened this Mar 1, 2024

qnixsynapse changed the title ~~[SYCL] IQ3_XXS quantized models seems to be getting a ggml_assert: ggml-sycl.cpp:12141: to_fp16_sycl != nullptr~~ [SYCL] Support newer non linear quantization Mar 1, 2024

ggerganov added the help wanted Extra attention is needed label Mar 1, 2024

qnixsynapse closed this as completed Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Support newer non linear quantization #5674

[SYCL] Support newer non linear quantization #5674

qnixsynapse commented Feb 23, 2024 •

edited

Loading

airMeng commented Feb 23, 2024 •

edited

Loading

qnixsynapse commented Feb 23, 2024

airMeng commented Feb 23, 2024

qnixsynapse commented Mar 1, 2024

qnixsynapse commented Mar 1, 2024

airMeng commented Mar 4, 2024

qnixsynapse commented Mar 4, 2024 •

edited

Loading

airMeng commented Mar 4, 2024

qnixsynapse commented Mar 4, 2024 •

edited by abhilash1910

Loading

NeoZhangJianyu commented Mar 6, 2024

abhilash1910 commented Mar 8, 2024

qnixsynapse commented Mar 8, 2024 •

edited

Loading

NeoZhangJianyu commented Mar 15, 2024

qnixsynapse commented Mar 15, 2024

NeoZhangJianyu commented Mar 16, 2024

qnixsynapse commented Mar 16, 2024

airMeng commented Apr 7, 2024

qnixsynapse commented Apr 7, 2024 •

edited

Loading

qnixsynapse commented Apr 9, 2024

abhilash1910 commented Apr 9, 2024

[SYCL] Support newer non linear quantization #5674

[SYCL] Support newer non linear quantization #5674

Comments

qnixsynapse commented Feb 23, 2024 • edited Loading

airMeng commented Feb 23, 2024 • edited Loading

qnixsynapse commented Feb 23, 2024

airMeng commented Feb 23, 2024

qnixsynapse commented Mar 1, 2024

qnixsynapse commented Mar 1, 2024

airMeng commented Mar 4, 2024

qnixsynapse commented Mar 4, 2024 • edited Loading

airMeng commented Mar 4, 2024

qnixsynapse commented Mar 4, 2024 • edited by abhilash1910 Loading

NeoZhangJianyu commented Mar 6, 2024

abhilash1910 commented Mar 8, 2024

qnixsynapse commented Mar 8, 2024 • edited Loading

NeoZhangJianyu commented Mar 15, 2024

qnixsynapse commented Mar 15, 2024

NeoZhangJianyu commented Mar 16, 2024

qnixsynapse commented Mar 16, 2024

airMeng commented Apr 7, 2024

qnixsynapse commented Apr 7, 2024 • edited Loading

qnixsynapse commented Apr 9, 2024

abhilash1910 commented Apr 9, 2024

qnixsynapse commented Feb 23, 2024 •

edited

Loading

airMeng commented Feb 23, 2024 •

edited

Loading

qnixsynapse commented Mar 4, 2024 •

edited

Loading

qnixsynapse commented Mar 4, 2024 •

edited by abhilash1910

Loading

qnixsynapse commented Mar 8, 2024 •

edited

Loading

qnixsynapse commented Apr 7, 2024 •

edited

Loading