Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Unity] PagedKVCache supporting on-the-fly RoPE calculation #16396

Merged

Conversation

MasterJH5574
Copy link
Contributor

This PR enhances PagedKVCache with the inline RoPE compute, which unblocks the movement towards sliding window and attention sink.

Both FlashInfer and TIR kernels are updated in this PR with the RoPE calculation. Note that FlashInfer is bumped in order to include the RoPE update.

The previous standalone kernel used for RoPE application are thereby removed.


Co-authored-by: Bohan Hou [email protected]
Co-authored-by: Hongyi Jin [email protected]

@MasterJH5574 MasterJH5574 force-pushed the unity-dev/2024-01-13-kv-cache-rope branch 3 times, most recently from b67d71e to 281ddc3 Compare January 13, 2024 22:49
@MasterJH5574 MasterJH5574 force-pushed the unity-dev/2024-01-13-kv-cache-rope branch 3 times, most recently from f162975 to bd3b958 Compare January 14, 2024 19:01
This PR enhances PagedKVCache with the inline RoPE compute,
which unblocks the movement towards sliding window and attention
sink.

Both FlashInfer and TIR kernels are updated in this PR with
the RoPE calculation. Note that FlashInfer is bumped in order
to include the RoPE update.

The previous standalone kernel used for RoPE application
are thereby removed.

---

Co-authored-by: Bohan Hou <[email protected]>
Co-authored-by: Hongyi Jin <[email protected]>
@MasterJH5574 MasterJH5574 force-pushed the unity-dev/2024-01-13-kv-cache-rope branch from bd3b958 to 6f180d4 Compare January 14, 2024 21:10
@tqchen tqchen merged commit 98d5153 into apache:unity Jan 15, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants