Use naive decompress for SM<8.0 #32

mgoin · 2024-02-20T21:52:45Z

A warning will be printed out if this case is triggered:

WARNING 02-20 22:21:27 sparse_w16a16.py:32] Unstructured sparse kernels are not optimized for NVIDIA SM < 8.0. Naive decompress kernels will be used and can be slower than dense models

Works on a T4 with:

from vllm import LLM, SamplingParams

model = LLM(
    "nm-testing/opt-125m-pruned2.4", 
    sparsity="sparse_w16a16",
    enforce_eager=True,
    dtype="float16",
)

sampling_params = SamplingParams(max_tokens=100, temperature=0)
outputs = model.generate("Hello my name is", sampling_params=sampling_params)
outputs[0].outputs[0].text

Test within colab: https://colab.research.google.com/drive/15xRvWX5gNaTb00BcaXhxwMm6yxavIKGN?usp=sharing

vllm/model_executor/layers/sparsity/sparse_w16a16.py

LucasWilkinson

LGTM, thanks for doing this!

A warning will be printed out if this case is triggered: ``` WARNING 02-20 22:21:27 sparse_w16a16.py:32] Unstructured sparse kernels are not optimized for NVIDIA SM < 8.0. Naive decompress kernels will be used and can be slower than dense models ``` Works on a T4 with: ```python from vllm import LLM, SamplingParams model = LLM( "nm-testing/opt-125m-pruned2.4", sparsity="sparse_w16a16", enforce_eager=True, dtype="float16", ) sampling_params = SamplingParams(max_tokens=100, temperature=0) outputs = model.generate("Hello my name is", sampling_params=sampling_params) outputs[0].outputs[0].text ``` Test within colab: https://colab.research.google.com/drive/15xRvWX5gNaTb00BcaXhxwMm6yxavIKGN?usp=sharing

mgoin added 4 commits February 20, 2024 13:52

Use naive decompress for SM<8.0

492061c

Update sparse_w16a16.py

a2b019d

Format

c381ba5

Update sparse_w16a16.py

c30a29a

LucasWilkinson reviewed Feb 20, 2024

View reviewed changes

vllm/model_executor/layers/sparsity/sparse_w16a16.py Outdated Show resolved Hide resolved

LucasWilkinson approved these changes Feb 20, 2024

View reviewed changes

Update sparse_w16a16.py

a4e6316

mgoin merged commit b61bc82 into main Feb 21, 2024
2 checks passed

mgoin deleted the support-bitmask-fallback branch February 21, 2024 00:11

andy-neuma mentioned this pull request Feb 23, 2024

andy/bump main to v0.3.2 #49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use naive decompress for SM<8.0 #32

Use naive decompress for SM<8.0 #32

mgoin commented Feb 20, 2024 •

edited

Loading

LucasWilkinson left a comment

Use naive decompress for SM<8.0 #32

Use naive decompress for SM<8.0 #32

Conversation

mgoin commented Feb 20, 2024 • edited Loading

LucasWilkinson left a comment

Choose a reason for hiding this comment

mgoin commented Feb 20, 2024 •

edited

Loading