Semi-structured 2:4 sparsity via SparseSemiStructuredTensor #4

afeldman-nm · 2024-02-05T19:56:31Z

magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support.

This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.

…anch safe_expose_semi_structured_sparse_tensor

.gitignore

vllm/model_executor/layers/parameters/sparsity.py

vllm/model_executor/layers/sparsity/base_config.py

vllm/model_executor/layers/sparsity/semi_structured_sparse_w16a16.py

vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py

LucasWilkinson · 2024-02-08T21:08:00Z

We shouldn't add magic_wand as a submodule, this will become a pip requirement once we get magic_wand on pip, for now I think its fine to just have the user manually install it (i.e. nothing in this repo enforces it is installed)

afeldman-nm · 2024-02-09T00:20:04Z

We shouldn't add magic_wand as a submodule, this will become a pip requirement once we get magic_wand on pip, for now I think its fine to just have the user manually install it (i.e. nothing in this repo enforces it is installed)

Yes I believe this is how I have it set up right now, the user does manual install

afeldman-nm · 2024-02-09T01:19:10Z

The most recent commit adds bfloat16 as a supported dtype for vLLM 2:4 sparsity profile.

Note #1: no changes to magic_wand were required for bfloat16 + 2:4 support: turns out I parameterized the magic_wand 2:4 implementation by dtype. So bfloat16 works automatically (and I can see that inference produces a slightly different result for bfloat16 than float16, which makes sense; the token probabilities are probably being tweaked due to differences in rounding errors between the two datatypes.)

Note #2: the BE sparsity profile does not list bfloat16 as a supported dtype; I will not investigate bfloat16 support in BE in this PR owing to the potential complexity involved in i.e. making the bitmask expand process generalize to different dtypes beyond float16. @LucasWilkinson could potentially investigate this if it is a priority.

LucasWilkinson · 2024-02-12T16:34:41Z

vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py

+            if bias is not None:
+                output = F.linear(x, sparse_weight, bias)
+            else:
+                output = F.linear(x, sparse_weight, bias=None)


we do we need this if? if bias is None, why not just pass bias?

vllm/config.py

vllm/model_executor/layers/sparsity/__init__.py

vllm/model_executor/layers/sparsity/semi_structured_sparse_w16a16.py

LucasWilkinson

LGTM

magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support. This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.

afeldman-nm added 8 commits February 1, 2024 23:41

.gitignore magic_wand dir

b8810c7

added 2:4 example (not actually using 2:4 yet\!)

d56b4c4

use only cuda:0

1a8bc1c

wip semi_structured_sparse_w16a16

2c6ff26

restructuring sparsity

2856b91

difficulty creating sparse parameter class

708fe1b

first successful run with 2:4 sparse model; compat with magic_wand br…

40a8afb

…anch safe_expose_semi_structured_sparse_tensor

Merge branch 'main' into semi_structured

017a296

afeldman-nm requested review from mgoin, LucasWilkinson and alexm-neuralmagic February 5, 2024 19:56

woops uncommenting assert statement

a344b60

mgoin reviewed Feb 8, 2024

View reviewed changes

fixes

7a2a7ed

afeldman-nm requested a review from mgoin February 8, 2024 20:10

bfloat16

0711a74

hopefully removed magic_wand submodule

fc85cac

LucasWilkinson reviewed Feb 12, 2024

View reviewed changes

afeldman-nm added 2 commits February 12, 2024 12:00

refactoring

ced7222

small cleanup

ded2c5b

afeldman-nm requested a review from LucasWilkinson February 12, 2024 17:02

small formatting fix

32fa245

mgoin reviewed Feb 12, 2024

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/sparsity/__init__.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/sparsity/semi_structured_sparse_w16a16.py Outdated Show resolved Hide resolved

Apply suggestions from code review

95303b3

LucasWilkinson approved these changes Feb 12, 2024

View reviewed changes

lint/format

51ebca3

afeldman-nm merged commit 6075c74 into main Feb 13, 2024

afeldman-nm mentioned this pull request Feb 13, 2024

Revert "Semi-structured 2:4 sparsity via SparseSemiStructuredTensor" #11

Closed

andy-neuma mentioned this pull request Feb 23, 2024

andy/bump main to v0.3.2 #49

Closed

afeldman-nm deleted the semi_structured branch August 7, 2024 04:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semi-structured 2:4 sparsity via SparseSemiStructuredTensor #4

Semi-structured 2:4 sparsity via SparseSemiStructuredTensor #4

afeldman-nm commented Feb 5, 2024 •

edited

Loading

LucasWilkinson commented Feb 8, 2024

afeldman-nm commented Feb 9, 2024

afeldman-nm commented Feb 9, 2024 •

edited

Loading

LucasWilkinson Feb 12, 2024

LucasWilkinson left a comment

Semi-structured 2:4 sparsity via SparseSemiStructuredTensor #4

Semi-structured 2:4 sparsity via SparseSemiStructuredTensor #4

Conversation

afeldman-nm commented Feb 5, 2024 • edited Loading

LucasWilkinson commented Feb 8, 2024

afeldman-nm commented Feb 9, 2024

afeldman-nm commented Feb 9, 2024 • edited Loading

LucasWilkinson Feb 12, 2024

Choose a reason for hiding this comment

LucasWilkinson left a comment

Choose a reason for hiding this comment

afeldman-nm commented Feb 5, 2024 •

edited

Loading

afeldman-nm commented Feb 9, 2024 •

edited

Loading