-
Notifications
You must be signed in to change notification settings - Fork 923
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[LLMChat] Make llm_chat compatible with PagedKVCache (#293)
PagedKVCache is introduced in MLC-LLM a while back to unite the interface for KVCache. This PR makes WebLLM compatible with the new PagedKVCache interface, encapsulating it with the goal that WebLLM users will not notice any difference. This PR is equivalent to the changes to `llm_chat.cc` in mlc-ai/mlc-llm#1651, and should address issues like mlc-ai/mlc-llm#1628. There are still existing model compilation issues regarding `workgroup_size` (since WebGPU, unlike most other backends, can only support 256 number of threads). We will address this issue more elegantly soon; for now, compiling llama-based models require manually changing kernel sizes as shown in [this branch](https://github.com/CharlieFRuan/mlc-llm/tree/local-workgroupSize-webLLM-kvCache). This PR is also largely dependent on apache/tvm#16554.
- Loading branch information
1 parent
3319d1c
commit ec2662f
Showing
2 changed files
with
116 additions
and
32 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters