Skip to content

Commit

Permalink
[Unity] PagedKVCache supporting on-the-fly RoPE calculation
Browse files Browse the repository at this point in the history
This PR enhances PagedKVCache with the inline RoPE compute,
which unblocks the movement towards sliding window and attention
sink.

Both FlashInfer and TIR kernels are updated in this PR with
the RoPE calculation. Note that FlashInfer is bumped in order
to include the RoPE update.

The previous standalone kernel used for RoPE application
are thereby removed.

---

Co-authored-by: Bohan Hou <[email protected]>
Co-authored-by: Hongyi Jin <[email protected]>
  • Loading branch information
3 people committed Jan 14, 2024
1 parent 07d8e02 commit f162975
Show file tree
Hide file tree
Showing 4 changed files with 631 additions and 211 deletions.

0 comments on commit f162975

Please sign in to comment.