Refactor 2 awq gemm kernels into m16nXk32 #2723

zcnrex · 2024-02-02T06:14:41Z

Merge gemm_forward_4bit_cuda_m16n64k32 and gemm_forward_4bit_cuda_m16n128k32 into 1 function, and pass 64/128 as a parameter

Used the diff tool to facilitate identifying numbers to change:
https://www.diffchecker.com/VIc6ukUz/

Next PRs: add cuBlas as an alternative to awq-gemm as discussed in #2566 (comment)

zcnrex · 2024-02-03T02:34:28Z

@WoosukKwon could you please help review?

casper-hansen · 2024-02-03T10:11:49Z

@zcnrex Nice job refactoring the GEMM kernels. I am especially excited to see if the dequant + cublas can bring a speedup like we talked about.

WoosukKwon

LGTM! Thanks for the refactoring. Sorry for the late review 🙏

[ROCm] Fix build problem resulted from previous commit related to FP8 kv-cache support (vllm-project#2790) Add documentation on how to do incremental builds (vllm-project#2796) [Ray] Integration compiled DAG off by default (vllm-project#2471) Disable custom all reduce by default (vllm-project#2808) add usage context removed usage_context from Engine_args Move IO to another process added http request [ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention (vllm-project#2768) Add documentation section about LoRA (vllm-project#2834) Refactor 2 awq gemm kernels into m16nXk32 (vllm-project#2723) Co-authored-by: Chunan Zeng <[email protected]> Added additional arg for from_engine_args comments

Co-authored-by: Chunan Zeng <[email protected]>

mengsoso · 2024-08-13T11:31:06Z

@zcnrex @WoosukKwon
Could you help look at this issue: #7400

Co-authored-by: Chunan Zeng <[email protected]>

Chunan Zeng and others added 3 commits February 1, 2024 22:06

Combine 2 awq gemm kernels into m16nXk32

156436b

fix build error

480e3e2

Fix build error

017ad79

zcnrex marked this pull request as ready for review February 3, 2024 00:15

zcnrex changed the title ~~Combine 2 awq gemm kernels into m16nXk32~~ Refactor 2 awq gemm kernels into m16nXk32 Feb 3, 2024

WoosukKwon self-requested a review February 12, 2024 17:59

WoosukKwon approved these changes Feb 12, 2024

View reviewed changes

WoosukKwon merged commit 5638364 into vllm-project:main Feb 12, 2024
17 checks passed

jvmncs pushed a commit to jvmncs/vllm that referenced this pull request Feb 14, 2024

Refactor 2 awq gemm kernels into m16nXk32 (vllm-project#2723)

b440270

Co-authored-by: Chunan Zeng <[email protected]>

zcnrex mentioned this pull request Feb 14, 2024

Speed up AWQ 2.5x with updated kernel #2874

Closed

3 tasks

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2024

Refactor 2 awq gemm kernels into m16nXk32 (vllm-project#2723)

aa340c6

Co-authored-by: Chunan Zeng <[email protected]>

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 22, 2024

Refactor 2 awq gemm kernels into m16nXk32 (vllm-project#2723)

f61cb9a

Co-authored-by: Chunan Zeng <[email protected]>

andy-neuma mentioned this pull request Feb 23, 2024

andy/bump main to v0.3.2 neuralmagic/nm-vllm#49

Closed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

Refactor 2 awq gemm kernels into m16nXk32 (vllm-project#2723)

b616482

Co-authored-by: Chunan Zeng <[email protected]>

mgoin mentioned this pull request Aug 11, 2024

[Bug]: Bug in quantization/awq /gemm_kernels.cu gemm_forward_4bit_cuda_m16nXk32 More result have been write #7400

Open

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

Refactor 2 awq gemm kernels into m16nXk32 (vllm-project#2723)

0522020

Co-authored-by: Chunan Zeng <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor 2 awq gemm kernels into m16nXk32 #2723

Refactor 2 awq gemm kernels into m16nXk32 #2723

zcnrex commented Feb 2, 2024 •

edited

Loading

zcnrex commented Feb 3, 2024

casper-hansen commented Feb 3, 2024

WoosukKwon left a comment

mengsoso commented Aug 13, 2024

Refactor 2 awq gemm kernels into m16nXk32 #2723

Refactor 2 awq gemm kernels into m16nXk32 #2723

Conversation

zcnrex commented Feb 2, 2024 • edited Loading

zcnrex commented Feb 3, 2024

casper-hansen commented Feb 3, 2024

WoosukKwon left a comment

Choose a reason for hiding this comment

mengsoso commented Aug 13, 2024

zcnrex commented Feb 2, 2024 •

edited

Loading