-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Unity] PagedKVCache supporting on-the-fly RoPE calculation
This PR enhances PagedKVCache with the inline RoPE compute, which unblocks the movement towards sliding window and attention sink. Both FlashInfer and TIR kernels are updated in this PR with the RoPE calculation. Note that FlashInfer is bumped in order to include the RoPE update. The previous standalone kernel used for RoPE application are thereby removed. --- Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]>
- Loading branch information
1 parent
07d8e02
commit f162975
Showing
4 changed files
with
631 additions
and
211 deletions.
There are no files selected for viewing
Submodule flashinfer
updated
22 files
+28 −16 | CMakeLists.txt | |
+8 −2 | cmake/config.cmake | |
+177 −58 | include/flashinfer/cascade.cuh | |
+27 −32 | include/flashinfer/decode.cuh | |
+30 −20 | include/flashinfer/handler.cuh | |
+29 −12 | include/flashinfer/page.cuh | |
+275 −140 | include/flashinfer/prefill.cuh | |
+11 −6 | include/flashinfer/utils.cuh | |
+2 −2 | python/csrc/batch_decode.cu | |
+4 −4 | python/csrc/batch_prefill.cu | |
+7 −7 | python/csrc/cascade.cu | |
+8 −8 | python/flashinfer/ops/__init__.py | |
+1 −0 | python/setup.py | |
+10 −8 | src/bench_batch_decode.cu | |
+349 −0 | src/bench_cascade.cu | |
+11 −9 | src/test_batch_decode.cu | |
+15 −20 | src/test_batch_prefill.cu | |
+413 −0 | src/test_cascade.cu | |
+1 −1 | src/test_single_decode.cu | |
+1 −1 | src/test_single_prefill.cu | |
+77 −26 | src/tvm_wrapper.cu | |
+103 −0 | src/utils.h |
Oops, something went wrong.