SparseAttention, Triton, and OpBuilder #3699

delock · 2023-06-07T00:44:38Z

delock
Jun 7, 2023
Collaborator

Hi, I noticed that SparseAttention is implemented with Triton for CUDA execution. When we try to implement SparseAttention on other accelerator, we found Triton might be a blocker because Triton are not available on every other accelerator. In that case, implement SparseAttention OpBuilder and kernels would be a natural option.

I'm wondering whether DeepSpeed can allow Triton implement to coexist with OpBuilder implementation in DeepSpeed to enhance extendability. The idea is to implement a special PythonBuilder class that allows the OpBuilder loaded module to call python function, inside of python function we can call python function or Triton implementation. A demotration of the concept can be found in the following link.
https://github.com/delock/DeepSpeedSYCLSupport/blob/gma/kernel-python-study/op_builder/cpu/transformer_inference.py

With the OpBuilder introduced back, DeepSpeed would have the flexibility to implement a function with either Triton or accelerator native code, with a unify interface. This would enhance extendability on acclerator where Triton had not been implemented yet.

delock · 2023-06-13T02:31:43Z

delock
Jun 13, 2023
Collaborator Author

Hi @tjruwase , any comments for this idea? We are thinking about combine op builder interface with Triton kernels so we can open the possibilities for write non-triton kernels for sparse attention, while keep CUDA path on Triton.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SparseAttention, Triton, and OpBuilder #3699

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

SparseAttention, Triton, and OpBuilder #3699

delock Jun 7, 2023 Collaborator

Replies: 1 comment

delock Jun 13, 2023 Collaborator Author

delock
Jun 7, 2023
Collaborator

delock
Jun 13, 2023
Collaborator Author