Replies: 1 comment 5 replies
-
Some random notes:
|
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Right now, AICI assumes a stateful interface with LLM inference engine, where new sequences are created (forked) and the KV cache is manipulated by backtracking and fast-forwarding. As noted by @AaronFriel the Automatic Prefix Caching in vLLM (probably coming to other engines as well) might simplify this.
Starting discussion thread for comments.
cc @emrekiciman @simon-mo
Beta Was this translation helpful? Give feedback.
All reactions