-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Add env variable to force flashinfer backend to enable tensor cores #9497
[Kernel] Add env variable to force flashinfer backend to enable tensor cores #9497
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Co-authored-by: Chih-Chieh Yang <[email protected]> Signed-off-by: Thomas Parnell <[email protected]>
72a300c
to
8de2c18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Meanwhile, cc @yzh119
Co-authored-by: Cody Yu <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
…r cores (vllm-project#9497) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Cody Yu <[email protected]> Signed-off-by: charlifu <[email protected]>
…r cores (vllm-project#9497) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Cody Yu <[email protected]> Signed-off-by: Vinay Damodaran <[email protected]>
…r cores (vllm-project#9497) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Cody Yu <[email protected]> Signed-off-by: Alvant <[email protected]>
…r cores (vllm-project#9497) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Cody Yu <[email protected]> Signed-off-by: Amit Garg <[email protected]>
…r cores (vllm-project#9497) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Cody Yu <[email protected]> Signed-off-by: qishuai <[email protected]>
…r cores (vllm-project#9497) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Cody Yu <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>
Relates to #9471
We find that the heuristic used to decide when to enable tensor cores in Flashinfer is not working well for llama3.1-8b. While we try to figure out a better one, we propose adding this environment variable to override the logic.