-
Notifications
You must be signed in to change notification settings - Fork 10
Semi-structured 2:4 sparsity via SparseSemiStructuredTensor #4
Conversation
…anch safe_expose_semi_structured_sparse_tensor
vllm/model_executor/layers/sparsity/semi_structured_sparse_w16a16.py
Outdated
Show resolved
Hide resolved
vllm/model_executor/layers/sparsity/semi_structured_sparse_w16a16.py
Outdated
Show resolved
Hide resolved
vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py
Outdated
Show resolved
Hide resolved
We shouldn't add |
Yes I believe this is how I have it set up right now, the user does manual install |
The most recent commit adds bfloat16 as a supported dtype for vLLM 2:4 sparsity profile. Note #1: no changes to magic_wand were required for bfloat16 + 2:4 support: turns out I parameterized the magic_wand 2:4 implementation by dtype. So bfloat16 works automatically (and I can see that inference produces a slightly different result for bfloat16 than float16, which makes sense; the token probabilities are probably being tweaked due to differences in rounding errors between the two datatypes.) Note #2: the BE sparsity profile does not list bfloat16 as a supported dtype; I will not investigate bfloat16 support in BE in this PR owing to the potential complexity involved in i.e. making the bitmask expand process generalize to different dtypes beyond float16. @LucasWilkinson could potentially investigate this if it is a priority. |
if bias is not None: | ||
output = F.linear(x, sparse_weight, bias) | ||
else: | ||
output = F.linear(x, sparse_weight, bias=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do we need this if? if bias
is None, why not just pass bias?
vllm/model_executor/layers/sparsity/semi_structured_sparse_w16a16.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support. This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support. This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support. This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support. This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support. This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support. This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support. This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support. This PR also refactors the sparse linear method into a separate file, vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py, which supports all sparsity formats.
magic_wand semi_structured_sparse_tensor_linear branch integrates 2:4 semi-structured sparsity into SparseTensor. This PR adds a new sparsity config for 2:4 sparsity to neuralmagic-vllm, using the SparseTensor 2:4 support.
This PR also refactors the sparse linear method into a separate file,
vllm/model_executor/layers/sparsity/sparse_w16a16_linear_method.py
, which supports all sparsity formats.