[SYCL] Add q3_s and q1_s #5886

abhilash1910 · 2024-03-05T12:23:27Z

Support GGML_TYPE_IQ3_S, GGML_TYPE_IQ1_S in mul_mal/dequant.
cc @NeoZhangJianyu @airMeng @ggerganov

airMeng · 2024-03-05T12:41:06Z

have you ran test-backend-ops or verified on model level?

NeoZhangJianyu · 2024-03-06T07:55:15Z

Please update ggml_backend_sycl_supports_op() to suport the new types.
And run ci/run.sh to test it.
How do you test with the new types? which model or test case?

ggml-sycl.cpp

AlexKoff88 · 2024-03-07T10:31:31Z

@abhilash1910, do you have any performance estimates for the supported types in this PR?

abhilash1910 · 2024-03-07T12:31:21Z

Please update ggml_backend_sycl_supports_op() to suport the new types.
And run ci/run.sh to test it.

How do you test with the new types? which model or test case?

For

handled
I think q3s should support mistral along with llamav1/v2. Same for 1.5bit q1s. test-backend-ops is ok.

NeoZhangJianyu · 2024-03-08T00:46:54Z

Please update ggml_backend_sycl_supports_op() to suport the new types.
And run ci/run.sh to test it.

How do you test with the new types? which model or test case?

For

handled

I think q3s should support mistral along with llamav1/v2. Same for 1.5bit q1s. test-backend-ops is ok.

Great!

The UT will cover every OPs.
But in real inference, it's depended on the condition.
Have you tested this new type with Mistral model? and it's passed?

abhilash1910 · 2024-03-08T11:51:19Z

Please update ggml_backend_sycl_supports_op() to suport the new types.
And run ci/run.sh to test it.

How do you test with the new types? which model or test case?

For

handled

I think q3s should support mistral along with llamav1/v2. Same for 1.5bit q1s. test-backend-ops is ok.

Great!

The UT will cover every OPs. But in real inference, it's depended on the condition. Have you tested this new type with Mistral model? and it's passed?

Tested on mistral 7b without errors functionality wise.

ggerganov

I can't give this a test, but seems OK

Btw, might need some of your help in #5940 to help move the tables with quantum constants to the new ggml-common.h header. Will make the change and ping you to give it a try and confirm that it works

qnixsynapse · 2024-03-09T05:15:52Z

@abhilash1910 increasing the grid space seems to fix the regression. However. iQ3_S still throwing a ggml_assert.
ggml-sycl.cpp:14654: false

Probably around here:

  inline void ggml_sycl_op_dequantize_mul_mat_vec(
    const ggml_tensor *src0, const ggml_tensor *src1, ggml_tensor *dst,
    const char *src0_dd_i, const float *src1_ddf_i, const char *src1_ddq_i,
    float *dst_dd_i, const int64_t row_low, const int64_t row_high,
    const int64_t src1_ncols, const int64_t src1_padded_row_size,
    const dpct::queue_ptr &stream) {

    const int64_t ne00 = src0->ne[0];
    const int64_t row_diff = row_high - row_low;

    GGML_ASSERT(src1->type == GGML_TYPE_F32);

    // on some GPUs it is faster to convert src1 to half and to use half precision intrinsics
#ifdef GGML_SYCL_F16
    sycl_pool_alloc<sycl::half> src1_dfloat_a;
    sycl::half *src1_dfloat = nullptr; // dfloat == half

    bool src1_convert_f16 =
        src0->type == GGML_TYPE_Q4_0 || src0->type == GGML_TYPE_Q4_1 ||
        src0->type == GGML_TYPE_Q5_0 || src0->type == GGML_TYPE_Q5_1 ||
        src0->type == GGML_TYPE_Q8_0 || src0->type == GGML_TYPE_F16;

    if (src1_convert_f16) {
        src1_dfloat = src1_dfloat_a.alloc(ne00);
        const to_fp16_sycl_t to_fp16_sycl = ggml_get_to_fp16_sycl(src1->type);
        GGML_ASSERT(to_fp16_sycl != nullptr);
        to_fp16_sycl(src1_ddf_i, src1_dfloat, ne00, stream);
    }
#else
    const dfloat * src1_dfloat = (const dfloat *) src1_ddf_i; // dfloat == float, no conversion
#endif // GGML_SYCL_F16

    switch (src0->type) {
        case GGML_TYPE_Q4_0:
            dequantize_mul_mat_vec_q4_0_sycl(src0_dd_i, src1_dfloat, dst_dd_i, ne00, row_diff, stream);
            break;
        case GGML_TYPE_Q4_1:
            dequantize_mul_mat_vec_q4_1_sycl(src0_dd_i, src1_dfloat, dst_dd_i, ne00, row_diff, stream);
            break;
        case GGML_TYPE_Q5_0:
            dequantize_mul_mat_vec_q5_0_sycl(src0_dd_i, src1_dfloat, dst_dd_i, ne00, row_diff, stream);
            break;
        case GGML_TYPE_Q5_1:
            dequantize_mul_mat_vec_q5_1_sycl(src0_dd_i, src1_dfloat, dst_dd_i, ne00, row_diff, stream);
            break;
        case GGML_TYPE_Q8_0:
            dequantize_mul_mat_vec_q8_0_sycl(src0_dd_i, src1_dfloat, dst_dd_i, ne00, row_diff, stream);
            break;
        case GGML_TYPE_Q2_K:
            dequantize_mul_mat_vec_q2_K_sycl(src0_dd_i, src1_ddf_i, dst_dd_i, ne00, row_diff, stream);
            break;
        case GGML_TYPE_Q3_K:
            dequantize_mul_mat_vec_q3_K_sycl(src0_dd_i, src1_ddf_i, dst_dd_i, ne00, row_diff, stream);
            break;
        case GGML_TYPE_Q4_K:
            dequantize_mul_mat_vec_q4_K_sycl(src0_dd_i, src1_ddf_i, dst_dd_i, ne00, row_diff, stream);
            break;
        case GGML_TYPE_Q5_K:
            dequantize_mul_mat_vec_q5_K_sycl(src0_dd_i, src1_ddf_i, dst_dd_i, ne00, row_diff, stream);
            break;
        case GGML_TYPE_Q6_K:
            dequantize_mul_mat_vec_q6_K_sycl(src0_dd_i, src1_ddf_i, dst_dd_i, ne00, row_diff, stream);
            break;
        case GGML_TYPE_F16:
            convert_mul_mat_vec_f16_sycl(src0_dd_i, src1_dfloat, dst_dd_i, ne00, row_diff, stream);
            break;
        default:
            GGML_ASSERT(false);
            break;
    }

    (void) src1;
    (void) dst;
    (void) src1_ddq_i;
    (void) src1_ncols;
    (void) src1_padded_row_size;
}

airMeng · 2024-03-11T02:01:09Z

@abhilash1910 increasing the grid space seems to fix the regression. However. iQ3_S still throwing a ggml_assert. ggml-sycl.cpp:14654: false

echo, can't work on model level

* Add q3_s and q1_s * fix compilation * fix build * fix build * fix build * enable ops * rm macro * increase grid space

NeoZhangJianyu · 2024-03-16T01:49:27Z

@abhilash1910 increasing the grid space seems to fix the regression. However. iQ3_S still throwing a ggml_assert. ggml-sycl.cpp:14654: false

echo, can't work on model level

@abhilash1910 Could you check and fix this issue?

abhilash1910 · 2024-03-16T02:22:18Z

@abhilash1910 increasing the grid space seems to fix the regression. However. iQ3_S still throwing a ggml_assert. ggml-sycl.cpp:14654: false

echo, can't work on model level

@abhilash1910 Could you check and fix this issue?

Fix in progress at #6052 .

ikawrakow

Please remove this incorrect implementation.

ikawrakow · 2024-03-28T11:25:36Z

ggml-sycl.cpp

+    const int ib = tid%8; // 0...7
+    dst_t * y = yy + i*QK_K + 32*ib + 8*il;
+    const uint8_t  * qs = x[i].qs + 8*ib;
+    const uint8_t  * grid1 = (const uint8_t *)(iq3s_grid + qs[2*il+0]);


This is wrong. When you copy-paste my core without attribution, please make sure you are copy-pasting the correct code.

I think we are currently reviewing this, and interms of "attribution" I would suggest that we follow cuda code to adapt to our sycl backend and since some parts of the code base is almost similar, I donot find a reason to be defensive about it.
Like I said before, we are working on this because not all cuda code is applicable for us. I hope this makes communication easier .

ikawrakow · 2024-03-28T11:26:41Z

ggml-sycl.cpp

+    const block_iq1_s * x = (const block_iq1_s  *) vx;
+
+    const int tid = item_ct1.get_local_id(2);
+#if QK_K == 256


This is wrong. Please see PR #6014 for the correct implementation.

* Add q3_s and q1_s * fix compilation * fix build * fix build * fix build * enable ops * rm macro * increase grid space

abhilash1910 added 2 commits March 5, 2024 17:51

Add q3_s and q1_s

ad25195

fix compilation

6fd581e

abhilash1910 and others added 3 commits March 5, 2024 18:17

fix build

c999536

fix build

600193c

fix build

50f9ba3

enable ops

c810047

NeoZhangJianyu reviewed Mar 6, 2024

View reviewed changes

ggml-sycl.cpp Outdated Show resolved Hide resolved

rm macro

94f33d7

abhilash1910 mentioned this pull request Mar 8, 2024

[SYCL] Support newer non linear quantization #5674

Closed

abhilash1910 requested a review from ggerganov March 8, 2024 11:48

ggerganov approved these changes Mar 8, 2024

View reviewed changes

abhilash1910 requested a review from NeoZhangJianyu March 8, 2024 11:54

increase grid space

7bb5314

NeoZhangJianyu approved these changes Mar 11, 2024

View reviewed changes

Merge branch 'master' into sycl_q3s_q1s

989e15b

abhilash1910 merged commit ef3ced2 into master Mar 11, 2024
64 checks passed

NeoZhangJianyu pushed a commit to NeoZhangJianyu/llama.cpp that referenced this pull request Mar 12, 2024

[SYCL] Add q3_s and q1_s (ggerganov#5886)

bec4dda

* Add q3_s and q1_s * fix compilation * fix build * fix build * fix build * enable ops * rm macro * increase grid space

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024

[SYCL] Add q3_s and q1_s (ggerganov#5886)

56a0da9

* Add q3_s and q1_s * fix compilation * fix build * fix build * fix build * enable ops * rm macro * increase grid space

ikawrakow reviewed Mar 28, 2024

View reviewed changes

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

[SYCL] Add q3_s and q1_s (ggerganov#5886)

31b76e0

* Add q3_s and q1_s * fix compilation * fix build * fix build * fix build * enable ops * rm macro * increase grid space

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Add q3_s and q1_s #5886

[SYCL] Add q3_s and q1_s #5886

abhilash1910 commented Mar 5, 2024

airMeng commented Mar 5, 2024

NeoZhangJianyu commented Mar 6, 2024 •

edited

Loading

AlexKoff88 commented Mar 7, 2024

abhilash1910 commented Mar 7, 2024

NeoZhangJianyu commented Mar 8, 2024 •

edited by abhilash1910

Loading

abhilash1910 commented Mar 8, 2024

ggerganov left a comment

qnixsynapse commented Mar 9, 2024 •

edited

Loading

airMeng commented Mar 11, 2024

NeoZhangJianyu commented Mar 16, 2024

abhilash1910 commented Mar 16, 2024

ikawrakow left a comment

ikawrakow Mar 28, 2024

abhilash1910 Mar 28, 2024

ikawrakow Mar 28, 2024

[SYCL] Add q3_s and q1_s #5886

[SYCL] Add q3_s and q1_s #5886

Conversation

abhilash1910 commented Mar 5, 2024

airMeng commented Mar 5, 2024

NeoZhangJianyu commented Mar 6, 2024 • edited Loading

AlexKoff88 commented Mar 7, 2024

abhilash1910 commented Mar 7, 2024

NeoZhangJianyu commented Mar 8, 2024 • edited by abhilash1910 Loading

abhilash1910 commented Mar 8, 2024

ggerganov left a comment

Choose a reason for hiding this comment

qnixsynapse commented Mar 9, 2024 • edited Loading

airMeng commented Mar 11, 2024

NeoZhangJianyu commented Mar 16, 2024

abhilash1910 commented Mar 16, 2024

ikawrakow left a comment

Choose a reason for hiding this comment

ikawrakow Mar 28, 2024

Choose a reason for hiding this comment

abhilash1910 Mar 28, 2024

Choose a reason for hiding this comment

ikawrakow Mar 28, 2024

Choose a reason for hiding this comment

NeoZhangJianyu commented Mar 6, 2024 •

edited

Loading

NeoZhangJianyu commented Mar 8, 2024 •

edited by abhilash1910

Loading

qnixsynapse commented Mar 9, 2024 •

edited

Loading