Add matmul int4 for CUDA #17526

yufenglee · 2023-09-12T20:56:06Z

Description

Motivation and Context

onnxruntime/python/tools/quantization/matmul_weight_compress_quantizer.py

onnxruntime/python/tools/kernel_explorer/kernels/matmul_fp_int4.py

onnxruntime/python/tools/quantization/matmul_weight_compress_quantizer.py

onnxruntime/contrib_ops/cpu/quantization/dequantize_blockwise.h

+        int32_t k = k_block_idx * block_size;
+        const BlockwiseQuantBlock<T, block_size, bits>* blob_ptr = src_blob + task_idx;
+        if (nullptr != zero_points) {
+          blob_ptr->dequant(dst + n * K + k, scale[task_idx], zero_points[task_idx], k, K);


onnxruntime/contrib_ops/cpu/quantization/dequantize_blockwise.h

+        if (nullptr != zero_points) {
+          blob_ptr->dequant(dst + n * K + k, scale[task_idx], zero_points[task_idx], k, K);
+        } else {
+          blob_ptr->dequant(dst + n * K + k, scale[task_idx], k, K);


onnxruntime/contrib_ops/cpu/quantization/dequantize_blockwise.h

+        int32_t k = k_block_idx * block_size;
+        const BlockwiseQuantBlock<T, block_size, bits>* blob_ptr = src_blob + task_idx;
+        if (nullptr != zero_points) {
+          blob_ptr->dequant(dst + n * K + k, scale[task_idx], zero_points[task_idx], k, K);


onnxruntime/contrib_ops/cpu/quantization/dequantize_blockwise.h

+        if (nullptr != zero_points) {
+          blob_ptr->dequant(dst + n * K + k, scale[task_idx], zero_points[task_idx], k, K);
+        } else {
+          blob_ptr->dequant(dst + n * K + k, scale[task_idx], k, K);


onnxruntime/contrib_ops/cpu/quantization/dequantize_blockwise.h

+        int32_t k = k_block_idx * block_size;
+        const BlockwiseQuantBlock<T, block_size, bits>* blob_ptr = src_blob + task_idx;
+        if (nullptr != zero_points) {
+          blob_ptr->dequant(dst + n * K + k, scale[task_idx], zero_points[task_idx], k, K);


onnxruntime/contrib_ops/cpu/quantization/dequantize_blockwise.h

+        if (nullptr != zero_points) {
+          blob_ptr->dequant(dst + n * K + k, scale[task_idx], zero_points[task_idx], k, K);
+        } else {
+          blob_ptr->dequant(dst + n * K + k, scale[task_idx], k, K);


onnxruntime/contrib_ops/cpu/quantization/dequantize_blockwise.h

+        int32_t k = k_block_idx * block_size;
+        const BlockwiseQuantBlock<T, block_size, bits>* blob_ptr = src_blob + task_idx;
+        if (nullptr != zero_points) {
+          blob_ptr->dequant(dst + n * K + k, scale[task_idx], zero_points[task_idx], k, K);


onnxruntime/contrib_ops/cpu/quantization/dequantize_blockwise.h

+        if (nullptr != zero_points) {
+          blob_ptr->dequant(dst + n * K + k, scale[task_idx], zero_points[task_idx], k, K);
+        } else {
+          blob_ptr->dequant(dst + n * K + k, scale[task_idx], k, K);


onnxruntime/contrib_ops/cpu/quantization/dequantize_blockwise.h

+        int32_t k = k_block_idx * block_size;
+        const BlockwiseQuantBlock<T, block_size, bits>* blob_ptr = src_blob + task_idx;
+        if (nullptr != zero_points) {
+          blob_ptr->dequant(dst + n * K + k, scale[task_idx], zero_points[task_idx], k, K);


onnxruntime/contrib_ops/cpu/quantization/dequantize_blockwise.h

+        if (nullptr != zero_points) {
+          blob_ptr->dequant(dst + n * K + k, scale[task_idx], zero_points[task_idx], k, K);
+        } else {
+          blob_ptr->dequant(dst + n * K + k, scale[task_idx], k, K);


onnxruntime/python/tools/kernel_explorer/kernels/dequantize_int4.py

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py

onnxruntime/test/python/quantization/test_op_matmul_4bits.py

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py

onnxruntime/test/python/quantization/test_op_matmul_4bits.py

onnxruntime/python/onnxruntime_pybind_quant.cc

onnxruntime/test/python/quantization/test_quantizeblockwise_4bits.py

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py

onnxruntime/contrib_ops/cpu/quantization/dequantize_blockwise.h

onnxruntime/test/contrib_ops/matmul_with_quant_weight_test.cc

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py

onnxruntime/test/python/quantization/test_op_matmul_4bits.py

onnxruntime/python/tools/kernel_explorer/kernels/dequantize_blockwise_int4.py

yufenglee · 2023-10-11T18:03:26Z

A clean one: #17890

github-advanced-security bot found potential problems Sep 12, 2023

View reviewed changes

onnxruntime/python/tools/quantization/matmul_weight_compress_quantizer.py Fixed Show fixed Hide fixed

onnxruntime/python/tools/quantization/matmul_weight_compress_quantizer.py Fixed Show fixed Hide fixed

yufenglee force-pushed the yufeng/matmul_int4 branch from 690f452 to caa83a0 Compare September 12, 2023 21:25

github-advanced-security bot found potential problems Sep 12, 2023

View reviewed changes

yufenglee force-pushed the yufeng/matmul_int4 branch from d026123 to caa83a0 Compare September 14, 2023 01:36

github-advanced-security bot found potential problems Sep 18, 2023

View reviewed changes

onnxruntime/python/tools/kernel_explorer/kernels/dequantize_int4.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Sep 18, 2023

View reviewed changes

onnxruntime/python/tools/kernel_explorer/kernels/dequantize_int4.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Sep 18, 2023

View reviewed changes

onnxruntime/python/tools/kernel_explorer/kernels/dequantize_int4.py Fixed Show fixed Hide fixed

onnxruntime/python/tools/kernel_explorer/kernels/dequantize_int4.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Sep 18, 2023

View reviewed changes

onnxruntime/python/tools/kernel_explorer/kernels/dequantize_int4.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Sep 26, 2023

View reviewed changes

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Fixed Show fixed Hide fixed

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Fixed Show fixed Hide fixed

onnxruntime/test/python/quantization/test_op_matmul_4bits.py Fixed Show fixed Hide fixed

yufenglee force-pushed the yufeng/matmul_int4 branch from 43f73ce to dba6759 Compare September 26, 2023 05:16

github-advanced-security bot found potential problems Sep 27, 2023

View reviewed changes

onnxruntime/python/onnxruntime_pybind_quant.cc Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Sep 28, 2023

View reviewed changes

onnxruntime/test/python/quantization/test_quantizeblockwise_4bits.py Fixed Show fixed Hide fixed

onnxruntime/test/python/quantization/test_quantizeblockwise_4bits.py Fixed Show fixed Hide fixed

onnxruntime/test/python/quantization/test_quantizeblockwise_4bits.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Sep 28, 2023

View reviewed changes

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Fixed Show fixed Hide fixed

yufenglee force-pushed the yufeng/matmul_int4 branch from fa73b74 to a1977f8 Compare October 9, 2023 17:39

github-advanced-security bot found potential problems Oct 9, 2023

View reviewed changes

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Fixed Show fixed Hide fixed

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Oct 9, 2023

View reviewed changes

onnxruntime/test/python/quantization/test_op_matmul_4bits.py Fixed Show fixed Hide fixed

yufenglee force-pushed the yufeng/matmul_int4 branch from 3b59970 to b73cf79 Compare October 9, 2023 18:50

github-advanced-security bot found potential problems Oct 9, 2023

View reviewed changes

onnxruntime/python/tools/kernel_explorer/kernels/dequantize_blockwise_int4.py Fixed Show fixed Hide fixed

yufenglee force-pushed the yufeng/matmul_int4 branch from f988561 to 64f5aaf Compare October 9, 2023 22:30

yufenglee added 8 commits October 10, 2023 20:37

int4 support on GPU

2b96e30

change quant tool

7be5564

use fp32 as accumulator

2d8c8f9

refine the matmul_int4 kernel

19ecb96

refine the matmul_int4 kernel

0a01dce

refine benchmark tool

74c6b80

refine the dequant int4

0c3f6c5

optimize dequant

155d4b2

yufenglee added 20 commits October 10, 2023 20:37

refine quant tool

3ff52ed

add pybind for blockwise quant

453207f

fix build breaks

1c7f9d5

refine quant tool

7eba97e

fix fp16

dda154f

add unit test for QuantBlockwise pybind

2050392

refine the quant tool

c015d4d

add option to exlude logit layer

2034ac9

handle subgraph properly

8d972dd

fix build break

78d0f02

change matmul 4bits name

068f4db

change zp to 4bits

fef4216

revert change in matmul_weight4_quantizer.py

3375903

format code

4d8dbc0

fix build/test breaks

9585dca

fix break break

68191b7

fix test failures in training pipeline

bccf3d7

fix build break in traning CIs

a3abfcd

fix training CIs

4e29283

fix training CIs

d4e4145

yufenglee force-pushed the yufeng/matmul_int4 branch from e8d0497 to d4e4145 Compare October 11, 2023 05:39

yufenglee closed this Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add matmul int4 for CUDA #17526

Add matmul int4 for CUDA #17526

yufenglee commented Sep 12, 2023

yufenglee commented Oct 11, 2023

Add matmul int4 for CUDA #17526

Add matmul int4 for CUDA #17526

Conversation

yufenglee commented Sep 12, 2023

Description

Motivation and Context

yufenglee commented Oct 11, 2023