Skip to content

Commit

Permalink
[Unity] PagedKVCache supporting on-the-fly RoPE calculation
Browse files Browse the repository at this point in the history
This PR enhances PagedKVCache with the inline RoPE compute,
which unblocks the movement towards sliding window and attention
sink.

Both FlashInfer and TIR kernels are updated in this PR with
the RoPE calculation. Note that FlashInfer is bumped in order
to include the RoPE update.

The previous standalone kernel used for RoPE application
are thereby removed.

---

Co-authored-by: Bohan Hou <[email protected]>
Co-authored-by: Hongyi Jin <[email protected]>
  • Loading branch information
3 people committed Jan 13, 2024
1 parent 07d8e02 commit 281ddc3
Show file tree
Hide file tree
Showing 4 changed files with 591 additions and 175 deletions.

0 comments on commit 281ddc3

Please sign in to comment.