-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast Multi-ahead Attention support on AMD ROCM #978
Merged
Merged
Changes from 250 commits
Commits
Show all changes
540 commits
Select commit
Hold shift + click to select a range
bc23333
add option to build a standalone runner for splitk decoder; debugging…
tenpercent 2c7b9bb
fix a few bugs
tenpercent 709727f
fix an indexing bug
tenpercent 785481c
stash changes
tenpercent ff0ebdb
Add benchmark_mem_eff_attn_mqa_gqa_ck_tiled.py to benchmark mqa/gqa p…
qianfengz 9a8baf7
Synchronize with latest update in composable_kernel_tiled feature/fmh…
qianfengz 959ae7f
Tiny fix in benchmark_mem_eff_attn_mqa_gqa_ck_tiled.py
qianfengz cc2f487
Synchronize with latest update in composable_kernel_tiled and make al…
qianfengz 2162b45
Swith to new branch for composable_kernel_tiled submodule
qianfengz d6cf545
Add bfp16 instances for ck-tiled inference
qianfengz 5cfda98
Update to test and benchmark scripts to include bfloat16
qianfengz ab60547
Tiny update to ck_tiled kernel
qianfengz a2af789
Change to benchmark_mem_eff_attn_mqa_gqa_ck_tiled benchmark cases
qianfengz d957dd9
stash changes
tenpercent 40aa884
Use Async pipeline for no M/N0K1 padding cases
qianfengz 73e97d8
Add CF_FMHA_FWD_FAST_EXP2 to buiding
qianfengz b0c7023
Add Triton FA2 forward op
sgrigory 63c3523
Add Triton Flash Attention 2 to benchmarks
sgrigory fbd836a
Synchronize with latest third_party/composable_kernel and remove the …
qianfengz 0d15f1b
stash split attention testing wip
tenpercent 5c1bc54
Synchronize with latest third_party/composable_kernel again
qianfengz 0172147
Merge branch 'develop' into ck-tiled-fa
qianfengz a018550
Synchronize with latest third_party/composable_kernel_tiled
qianfengz 31da32e
Change to make ck decoder buildable with both ck tiled or non-tiled f…
qianfengz 22c8d6f
Change to make ck decoder buildable with both ck tiled or non-tiled f…
qianfengz 6428374
fix gqa for split-k=1
tenpercent f21e39a
Skip backward tests, fix import
sgrigory 6c5540c
fix the mask for decoding; row max and lse are computed correctly; de…
tenpercent 5225eef
make libtorch split-1 decoder implementation pass numerical correctness
tenpercent 45727d6
Disable CK kernel for large shapes, better catch OOMs
sgrigory de5098e
Merge branch 'develop' of https://github.com/ROCmSoftwarePlatform/xfo…
qianfengz 402ee91
Actually remove submodule composable_kernel_tiled from the branch
qianfengz 7904096
Change the domain for the repo of composable_kernel submodule to ROCm
qianfengz defb8d9
Merge branch 'develop' of https://github.com/ROCmSoftwarePlatform/xfo…
qianfengz 388a5ca
Merge pull request #5 from ROCmSoftwarePlatform/merge-upstream-merge
qianfengz b068558
Merge branch 'develop' into ck-tiled-fa
qianfengz 44f6160
Update to validate_inputs() in common.py to support 4d mqa/gqa
qianfengz e03f67a
synchronize test_mem_eff_attention_ck.py with test_mem_eff_attention.py
qianfengz 6aef46d
Tiny update in benchmark_mem_eff_attn_decoder_ck.py
qianfengz 4a1cea0
Synchronize benchmark_mem_eff_attention_ck.py with benchmark_mem_eff_…
qianfengz ad024e4
Merge branch 'develop' into ck-tiled-fa
qianfengz c5ca494
Remove benchmark_mem_eff_attn_decoder_ck_tiled.py
qianfengz a74ee16
Merge branch 'develop' into decoder-splitk
tenpercent 8ebfd5f
Support for Generic Attention Mask Coordinate
qianfengz 43e7797
Merge pull request #6 from sgrigory/add-triton-fa2
qianfengz ba5fd52
Add ck.FwOp and ck.BwOp to dispatched operations
qianfengz 6533aca
Add ck.FwOp and ck.BwOp to ALL_FW_OPS and ALL_BW_OPS
qianfengz 7fc3620
Update in tests/readme_test_on_rocm.txt
qianfengz 23e191a
Add ckF and ck_decoder to benchmark_mem_eff_attn_decoder.py
qianfengz b077cfc
Merge branch 'develop' into ck-tiled-fa
qianfengz 45287b7
Synchronize with the latest ck-tiled commits
qianfengz 1a74675
Add is_ck_tiled_used() c++ extension interface for judging if ck-tile…
qianfengz cbcc196
Remove composable_kernel_tiled submodule
qianfengz b4539f7
inner_product removed from splitk kernel code
tenpercent 9c52e0e
remove some commented out debug code
tenpercent 0a1aa5d
comment out debug code calling libtorch instead of hip implementation
tenpercent 153d722
remove commented out old and incorrect code fragments
tenpercent eea5fef
add python version override to cmakelists
tenpercent d442fbe
add conversion from Argument struct to string; fix split1 test crash
tenpercent 38c5e90
add f32 support in the python op
tenpercent b805813
refactor out input generation in cpp standalone
tenpercent 03aed21
set loop unrolls to 1 in order to avoid index errors (will need to be…
tenpercent 930dda1
fix output splits allocation
tenpercent bd50cf4
fix bug in split attention: sumexp needs timestep bounds in each split
tenpercent 60c997d
clang-format-10
tenpercent b655ded
Merge remote-tracking branch 'origin/develop' into decoder-splitk
tenpercent 588b3a0
Enable support of attn-bias types with LocalAttention
qianfengz 04cf84b
Enable support of attn-bias types with LocalAttention
qianfengz a27403c
Synchronize submodule composable_kernel to the latest commits
qianfengz dfc2618
Make the efficient_attention_forward_ck() C++ interface consistent wi…
qianfengz 5421612
Tiny fix in ck.py to make test_backward pass
qianfengz 248efe1
Merge remote-tracking branch 'origin/develop' into decoder-splitk
tenpercent 7948fe6
some refactorings for standalone tests
tenpercent e7ffe68
cleanup testing
tenpercent 4953101
Make the efficient_attention_forward_ck() C++ interface consistent wi…
qianfengz e99fc1a
Tiny fix in ck.py to make test_backward pass
qianfengz d7721d2
fix split1 attention csrc test
tenpercent 902910a
Enable support of flexible head-dim size (but <= 128) for ck-tiled fm…
qianfengz d1ef4bc
Use Async pipeline when no any padding used
qianfengz 6cb0f60
implement general split-k split-attention in libtorch, use for testing
tenpercent 0e04b17
fix split-max and split-sumexp shapes for split attention in libtorch
tenpercent e4d6b88
implement generic reduce split attention with libtorch
tenpercent 17ec430
implement testing split reduce hip vs libtorch; tbd debug split-k=2 n…
tenpercent 69f2f0a
refactor repetitive testing code
tenpercent 2d54085
address code review: rearrange loops
tenpercent f937f06
address code review: add comment about number of iterations per split
tenpercent 7f6b01f
address code review: remove comments
tenpercent 187a4bc
address code review: possibly eliminate a bug by using correct timest…
tenpercent b157cba
address code review: add todo
tenpercent 8581811
address code review: shift LDS access by tt_low to avoid smem overboo…
tenpercent b1638ad
address code review: simplify reduction loops in split attention
tenpercent 10e76ab
Tiny update in ck-tiled forward kernel
qianfengz 67009e0
address code review: merge for loops
tenpercent 8673fa9
address code review: simplify coefficient pick
tenpercent 3427dcc
fix runtime error message in testing code
tenpercent 2e11d32
fix split reduce test
tenpercent dabc771
address code review: fix smem offsets
tenpercent 6f1d5df
remove redundant comment
tenpercent 8ee60d7
address code review: initialize split attention workspace as empty
tenpercent ff985d2
address code review: rename local vars
tenpercent d7132b9
address code review: remove unused _rand_seqlens
tenpercent f4d5263
address code review: cleanup python tests
tenpercent d81285a
remove redundant new_max local var
tenpercent eba46f1
address code review: rename seq_acc
tenpercent 7f9ce55
re-enable loop unroll; adjust tests to handle splits with size divisi…
tenpercent f888b88
test a wider range of split-k in cpp tests; fix torch implementation …
tenpercent 88afcea
Merge pull request #8 from ROCmSoftwarePlatform/decoder-splitk
qianfengz bad053f
Synchronize with ck-tiled update to support head-dim-256 and LSE storing
qianfengz 391af2b
Add definition of FMHA_FWD_HEADDIM_SWITCH
qianfengz 53719f9
Split the ck-tiled inference instances based on head-dim sizes to imp…
qianfengz 92e088e
Setting k0n1_need_padding according to pipeline kQLoadOnce implementa…
qianfengz 60a8e4a
Add fmha forward c++ extension for ck-tiled
qianfengz 9357a24
Set SUPPORTED_MAX_K=256 in ck.py
qianfengz df479b5
Merge branch 'ck-tiled-fa' into develop
qianfengz 04ddd4c
fix index in split-k attention
tenpercent c922d73
fix index in softmax reduce and complete fixing wavefronts per block …
tenpercent f666965
clang-format-10
tenpercent ecaf623
Fix v_dram_transposed transpose transform in the kernel
qianfengz 8b337bd
Skipe trition_splitk for test_forward in test_mem_eff_attention.py
qianfengz ee577e2
cleanup commented dead code
tenpercent a21ac03
enable ck split-k in benchmark_attn_decoding
tenpercent 52dde22
Merge pull request #9 from ROCmSoftwarePlatform/decoder-splitk-opt
tenpercent 5e3213f
add rocm_ci workflow
tenpercent 0e47337
move scipy import from file level under function similar to _vec_bino…
tenpercent 0bf3546
Merge pull request #11 from ROCmSoftwarePlatform/tests-imports
qianfengz 1e1dca8
Merge branch 'develop' into ck-tiled-fa
qianfengz 360201f
Add including of math_v2.hpp to ck_attention_forward_decoder_splitk.h
qianfengz faf1b16
move forward_splitk to ck_splitk; make dispatch aware of ck_splitk an…
tenpercent 323ebae
Synchronize to latest ck-tiled and update accordingly
qianfengz 9d2be4f
fix benchmark_attn_decoding
tenpercent 7c3c766
Remove third_party/composable_kernel_tiled
qianfengz 708c047
[Fix] use kK0BlockLength for HeadDim256 padding judging
qianfengz a0f2643
Tiny type change for custom_mask_type in param class
qianfengz 96f3027
Change to use ROCm repo for ck-tiled submodule
qianfengz f3f2be4
Remove tests/test_forward_ck_tiled.py
qianfengz 34466be
Update to test_mqa_forward_ck_tiled.py to use common create_attn_bias…
qianfengz 2f92cde
Merge branch 'ck-tiled-fa' into develop
qianfengz 351c766
Add ck-tiled checking in test_mqa_forward_ck_tiled.py
qianfengz ed26f5b
Merge branch 'ck-tiled-fa' into develop
qianfengz b58b4ed
rearrange smem access in softmax reduction
tenpercent 5a026c0
Merge pull request #14 from ROCm/perf-adjustment-1
qianfengz 5bbbe8f
Merge pull request #13 from ROCm/dispatcher
qianfengz 8a40a31
Merge branch 'develop' of https://github.com/ROCmSoftwarePlatform/xfo…
qianfengz 21062d1
Add test_decoder and test_splitk_decoder for ROCM into test_mem_eff_a…
qianfengz df7d523
Add ref_attention_splitk and its test to tests/test_mem_eff_attention.py
qianfengz ee633c8
Rename test_mem_eff_attention_ck.py as discarded
qianfengz 2df5ed3
Add test_mqa_forward and ref_attention_mqa (for BMHK format mqa/gqa v…
qianfengz 7d1219b
Rename test_mqa_forward_ck_tiled.py as discarded
qianfengz fe6f96e
Remove CK specific script benchmark_mem_eff_attn_decoder_ck.py
qianfengz 5af967c
Refine benchmark_mem_eff_attn_mqa_gqa_ck_tiled.py
qianfengz 3f46c2f
Rename benchmark_mem_eff_attn_mqa_gqa_ck_tiled.py to benchmark_mem_ef…
qianfengz 2c27aac
Remove the runtime_error with using logsumexp in attention_forward_ge…
qianfengz 4b8ce7c
Add ck-tiled checking in ck.py
qianfengz 0d311f5
Remove CK-specific benchmark scripts
qianfengz d57a5db
Don't require is_cpu_tensor for seqstart_q/seqstart_k/seqlen_k in att…
qianfengz b25c239
Remove seqlen_cpu from _PaddedSeqLenInfo in attn_bias.py
qianfengz 1a3ce52
Change the branch for composable_kernel_tiled submodule and update to…
qianfengz f7bf9b4
Remove the using of seqlen_cpu in BwOp of ck.py
qianfengz 15d2a72
Remove the using of seqlen_cpu in BwOp of ck.py
qianfengz bcd1936
Align .clang_format with main branch and re-format c++ files
qianfengz 52ae8a3
Synchronize to latest ck-tiled commit
qianfengz af2aa86
Merge branch 'ck-tiled-fa' into develop
qianfengz 7dd3aee
Add checking of IS_CK_TILED into some testing scripts
qianfengz 5eb1235
Update to test_mem_eff_attention.py and ck.py
qianfengz dc0e67a
Merge branch 'ck-tiled-fa' into develop
qianfengz 58e6101
Building xformers using ck-tiled as default
qianfengz 1276abc
Merge branch 'ck-tiled-fa' into develop
qianfengz 389dfb4
ensure ck_decoder does not dispatch
tenpercent f8d9043
Add disable_on_rocm on some test scripts
qianfengz 78df6a9
Merge branch 'ck-tiled-fa' into develop
qianfengz 6dae63c
Update to test_mem_eff_attention.py
qianfengz a7ed88c
Merge branch 'ck-tiled-fa' into develop
qianfengz 20e178a
Merge pull request #16 from ROCm/fix_test_attn_bias_padded
qianfengz 0624c92
apply isort
tenpercent b8ebf08
apply black
tenpercent 3b33c5d
fix flake8 suggestions
tenpercent 0a9c933
add license headers and reapply black
tenpercent 47367a4
Merge pull request #17 from ROCm/linters
qianfengz fb46611
Merge pull request #10 from ROCm/enable-ci
qianfengz 28d3672
Tiny update to rocm_ci.yml
qianfengz 12fb41c
Add conditional compiling for cuda-depending codes in ROCM
qianfengz a9d83c6
Update to benchmark scripts
qianfengz 9ab3831
Rename the one script file
qianfengz 243dc6a
Revert "Add conditional compiling for cuda-depending codes in ROCM"
qianfengz 3240ba1
Update to scripts
qianfengz 0c51af1
Change and add readme for tests and benchmarks
qianfengz f36c93b
Remove the stuffs for supporting old ck
qianfengz 9e4582d
Remove old composable_kernel from submodule list
qianfengz 356cafd
Remove folder third_party/composable_kernel
qianfengz 8415b00
Merge branch 'develop' into dev_to_upstream
qianfengz 79c554c
Rename the folder
qianfengz 2be6c04
Remove unused script file
qianfengz 61d875a
apply black
tenpercent 4616121
pacify mypy
tenpercent 832e223
fix clang-format
tenpercent 2b2967e
reapply black
tenpercent 89fb7d6
Merge pull request #3 from tenpercent/lints
tenpercent 3c9d4e5
fix lints
tenpercent 1d474c5
make test_splitk_reference run on cpu
tenpercent d38a684
add ck modules to docs
tenpercent eccbf54
try fixing nvidia build by re-including sparse24 cpp folder into exte…
tenpercent 1ef6c20
update cutlass to upstream commit
tenpercent 9dfec0d
update flash-attention to upstream commit
tenpercent 9fcda18
simplify setup.py
tenpercent 01c2bfd
Merge branch 'main' of https://github.com/facebookresearch/xformers i…
tenpercent 58d38d4
remove duplicate run_batched_infer_causalmask_attnbias_dispatched<f16…
tenpercent 07183f0
add hip version and pytorch hip arch list to xformers build info
tenpercent 993a90c
fix build
tenpercent d4a374b
patch around the unhappy path in get_hip_version
tenpercent ff59f19
skip test_grad_checkpointing for triton_splitk since it doesn't have …
tenpercent 81bcfd5
re-enable test_mqa_forward since ck tiled is the current implementation
tenpercent a0f7f27
make skip test_wrong_alignment more generic
tenpercent a0d8dcc
reapply black
tenpercent bc7035c
simplify test_decoder
tenpercent f02d0d4
put python version check inside triton_splitk op
tenpercent 77a6c13
fix logic
tenpercent a7cd678
cleanup python3.9 checks in tests
tenpercent dea783d
cleanup test_attentions
tenpercent acd6b7a
cleanup test_checkpoint as test running on cpu does not depend on gpu…
tenpercent f467a1d
fix lints
tenpercent d758eac
try fixing win build by conditional import of triton in triton op
tenpercent 21f1904
re-enable test_triton_layernorm as it passes
tenpercent d880c36
re-enable test_triton_blocksparse as it passes
tenpercent 059c84f
cleanup test_sparse_tensors
tenpercent 8aa0bdc
cleanup test_custom_ops
tenpercent 5bc7bbe
reapply black
tenpercent 5b4ebe4
cleanup test_core_attention
tenpercent 473ebc7
benchmark ck ops on rocm only
tenpercent 2a7272e
Merge branch 'main' of https://github.com/facebookresearch/xformers i…
tenpercent 5d3247f
fix mypy
tenpercent 9be7f8d
Merge branch 'dev_upstream' of https://github.com/ROCm/xformers into …
tenpercent 58b0f75
fix lint: black
tenpercent 03b7294
fix lints: mypy
tenpercent a02ab9b
Rename HDim/headdim to MaxK/maxk
qianfengz fd36725
Move some headers files to ck examples for later reusing
qianfengz 41f5ada
Merge branch 'develop' of https://github.com/ROCm/xformers into develop
qianfengz d8384c1
Replace using qs_ks_vs pipeline by qr_ks_vs pipeline while HeadDim is…
qianfengz e5d4a76
rm test_ck_7
tenpercent 7d43238
fix lints
tenpercent 6fbb383
Merge branch 'main' of https://github.com/facebookresearch/xformers i…
tenpercent 1db3a5a
unskip test_unsupported_alignment
tenpercent 57d7e96
move out test_splitk_reference
tenpercent 14c831e
add license header to file created in prev commit
tenpercent d5a26a6
roll back fmha/common.py
tenpercent 3560806
fix lint
tenpercent f654b3a
remove unused ref_attention_mqa
tenpercent 99947ff
Merge pull request #5 from ROCm/roll-back-fmha-common
qianfengz c5ea221
resolve error in triton_splitk on rocm
tenpercent b585563
Merge branch 'main' of https://github.com/facebookresearch/xformers i…
tenpercent 6752f07
disable partial attention tests on rocm
tenpercent File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
name: ROCM_CI | ||
|
||
on: | ||
pull_request: | ||
types: [labeled, synchronize, reopened] | ||
|
||
jobs: | ||
build: | ||
if: contains(github.event.label.name, 'rocm') | ||
runs-on: rocm | ||
|
||
steps: | ||
- uses: actions/checkout@v2 | ||
- name: Get CPU info on Ubuntu | ||
if: contains(runner.os, 'linux') | ||
run: | | ||
cat /proc/cpuinfo | ||
- name: Get env vars | ||
run: | | ||
echo GITHUB_WORKFLOW = $GITHUB_WORKFLOW | ||
echo HOME = $HOME | ||
echo PWD = $PWD | ||
echo GITHUB_ACTION = $GITHUB_ACTION | ||
echo GITHUB_ACTIONS = $GITHUB_ACTIONS | ||
echo GITHUB_REPOSITORY = $GITHUB_REPOSITORY | ||
echo GITHUB_EVENT_NAME = $GITHUB_EVENT_NAME | ||
echo GITHUB_EVENT_PATH = $GITHUB_EVENT_PATH | ||
echo GITHUB_WORKSPACE = $GITHUB_WORKSPACE | ||
echo GITHUB_SHA = $GITHUB_SHA | ||
echo GITHUB_REF = $GITHUB_REF | ||
export GIT_BRANCH=${GITHUB_BASE_REF:-${GITHUB_REF#refs/heads/}} | ||
echo GIT_BRANCH = $GIT_BRANCH | ||
export ROCM_PATH=/opt/rocm | ||
echo ROCM_PATH = $ROCM_PATH | ||
export MAX_JOBS=64 | ||
echo MAX_JOBS = $MAX_JOBS | ||
hipcc --version | ||
rocm-smi | ||
rocminfo | grep "gfx" | ||
- name: Build XFormers | ||
run: | | ||
git clone --recursive -b $GIT_BRANCH $GITHUB_REPOSITORY | ||
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G -v $PWD/xformers:/xformers rocm/pytorch-nightly:latest | ||
pip3 install --upgrade pip | ||
pip3 uninstall -y xformers | ||
MAX_JOBS=$MAX_JOBS pip3 install -e /xformers --verbose | ||
pip3 install scipy==1.10 | ||
python3 -c "import torch; print(torch.__version__)" | ||
python3 -m xformers.info | ||
- name: Run python tests | ||
run: | | ||
pytest -rpfs /xformers/tests/test_mem_eff_attention.py | tee test_mem_eff_attention.log | ||
- name: Archive logs | ||
uses: actions/upload-artifact@v3 | ||
with: | ||
name: test results | ||
path: test_mem_eff_attention_ck.log | ||
|
||
- name: Process test results | ||
run: | | ||
echo "Processing test results TBD" | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
|
||
1. #> pip install -e ./ | ||
|
||
2. verify testing for generic fmha inference on ROCM | ||
|
||
#> pytest tests/test_mem_eff_attention.py::test_forward | ||
|
||
3. verify testing for decoder fmha inference on ROCM | ||
|
||
#> pytest tests/test_mem_eff_attention.py::test_decoder | ||
#> pytest tests/test_mem_eff_attention.py::test_splitk_decoder | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Looks like decoder and triton_splitk should have been added here months ago. 🫢)