Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Semi-structured 2:4 sparsity via SparseSemiStructuredTensor #4

Merged
merged 17 commits into from
Feb 13, 2024

Conversation

afeldman-nm
Copy link

@afeldman-nm afeldman-nm commented Feb 5, 2024

magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support.

This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.

@LucasWilkinson
Copy link
Collaborator

We shouldn't add magic_wand as a submodule, this will become a pip requirement once we get magic_wand on pip, for now I think its fine to just have the user manually install it (i.e. nothing in this repo enforces it is installed)

@afeldman-nm
Copy link
Author

We shouldn't add magic_wand as a submodule, this will become a pip requirement once we get magic_wand on pip, for now I think its fine to just have the user manually install it (i.e. nothing in this repo enforces it is installed)

Yes I believe this is how I have it set up right now, the user does manual install

@afeldman-nm
Copy link
Author

afeldman-nm commented Feb 9, 2024

The most recent commit adds bfloat16 as a supported dtype for vLLM 2:4 sparsity profile.

Note #1: no changes to magic_wand were required for bfloat16 + 2:4 support: turns out I parameterized the magic_wand 2:4 implementation by dtype. So bfloat16 works automatically (and I can see that inference produces a slightly different result for bfloat16 than float16, which makes sense; the token probabilities are probably being tweaked due to differences in rounding errors between the two datatypes.)

Note #2: the BE sparsity profile does not list bfloat16 as a supported dtype; I will not investigate bfloat16 support in BE in this PR owing to the potential complexity involved in i.e. making the bitmask expand process generalize to different dtypes beyond float16. @LucasWilkinson could potentially investigate this if it is a priority.

if bias is not None:
output = F.linear(x, sparse_weight, bias)
else:
output = F.linear(x, sparse_weight, bias=None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do we need this if? if bias is None, why not just pass bias?

vllm/config.py Outdated Show resolved Hide resolved
vllm/model_executor/layers/sparsity/__init__.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@afeldman-nm afeldman-nm merged commit 6075c74 into main Feb 13, 2024
tlrmchlsmth pushed a commit that referenced this pull request Feb 13, 2024
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support.

This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.
robertgshaw2-neuralmagic pushed a commit that referenced this pull request Feb 20, 2024
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support.

This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.
robertgshaw2-neuralmagic pushed a commit that referenced this pull request Feb 20, 2024
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support.

This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.
robertgshaw2-neuralmagic pushed a commit that referenced this pull request Feb 21, 2024
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support.

This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.
tlrmchlsmth pushed a commit that referenced this pull request Feb 21, 2024
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support.

This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.
robertgshaw2-neuralmagic pushed a commit that referenced this pull request Feb 21, 2024
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support.

This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.
robertgshaw2-neuralmagic pushed a commit that referenced this pull request Feb 22, 2024
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support.

This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.
robertgshaw2-neuralmagic pushed a commit that referenced this pull request Feb 22, 2024
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support.

This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.
@afeldman-nm afeldman-nm deleted the semi_structured branch August 7, 2024 04:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants