-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Unity] PagedKVCache supporting on-the-fly RoPE calculation
This PR enhances PagedKVCache with the inline RoPE compute, which unblocks the movement towards sliding window and attention sink. Both FlashInfer and TIR kernels are updated in this PR with the RoPE calculation. Note that FlashInfer is bumped in order to include the RoPE update. The previous standalone kernel used for RoPE application are thereby removed. --- Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]>
- Loading branch information
1 parent
07d8e02
commit 281ddc3
Showing
4 changed files
with
591 additions
and
175 deletions.
There are no files selected for viewing
Submodule flashinfer
updated
22 files
+28 −16 | CMakeLists.txt | |
+8 −2 | cmake/config.cmake | |
+177 −58 | include/flashinfer/cascade.cuh | |
+27 −32 | include/flashinfer/decode.cuh | |
+30 −20 | include/flashinfer/handler.cuh | |
+29 −12 | include/flashinfer/page.cuh | |
+275 −140 | include/flashinfer/prefill.cuh | |
+11 −6 | include/flashinfer/utils.cuh | |
+2 −2 | python/csrc/batch_decode.cu | |
+4 −4 | python/csrc/batch_prefill.cu | |
+7 −7 | python/csrc/cascade.cu | |
+8 −8 | python/flashinfer/ops/__init__.py | |
+1 −0 | python/setup.py | |
+10 −8 | src/bench_batch_decode.cu | |
+349 −0 | src/bench_cascade.cu | |
+11 −9 | src/test_batch_decode.cu | |
+15 −20 | src/test_batch_prefill.cu | |
+413 −0 | src/test_cascade.cu | |
+1 −1 | src/test_single_decode.cu | |
+1 −1 | src/test_single_prefill.cu | |
+77 −26 | src/tvm_wrapper.cu | |
+103 −0 | src/utils.h |
Oops, something went wrong.