[Kernel] Add env variable to force flashinfer backend to enable tensor cores #9497

tdoublep · 2024-10-18T08:52:45Z

Relates to #9471

We find that the heuristic used to decide when to enable tensor cores in Flashinfer is not working well for llama3.1-8b. While we try to figure out a better one, we propose adding this environment variable to override the logic.

github-actions · 2024-10-18T08:52:58Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Co-authored-by: Chih-Chieh Yang <[email protected]> Signed-off-by: Thomas Parnell <[email protected]>

comaniac

LGTM. Meanwhile, cc @yzh119

vllm/attention/backends/flashinfer.py

Co-authored-by: Cody Yu <[email protected]>

Signed-off-by: Thomas Parnell <[email protected]>

…r cores (vllm-project#9497) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Cody Yu <[email protected]> Signed-off-by: charlifu <[email protected]>

…r cores (vllm-project#9497) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Cody Yu <[email protected]> Signed-off-by: Vinay Damodaran <[email protected]>

…r cores (vllm-project#9497) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Cody Yu <[email protected]> Signed-off-by: Alvant <[email protected]>

…r cores (vllm-project#9497) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Cody Yu <[email protected]> Signed-off-by: Amit Garg <[email protected]>

…r cores (vllm-project#9497) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Cody Yu <[email protected]> Signed-off-by: qishuai <[email protected]>

…r cores (vllm-project#9497) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Cody Yu <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>

tdoublep mentioned this pull request Oct 18, 2024

[Performance]: FLASHINFER backend is slower than FLASH_ATTN on H100 #9471

Closed

1 task

Add env variable to force flashinfer backend to enable tensor cores

8de2c18

Co-authored-by: Chih-Chieh Yang <[email protected]> Signed-off-by: Thomas Parnell <[email protected]>

tdoublep force-pushed the flashinfer-force-tensor-cores branch from 72a300c to 8de2c18 Compare October 18, 2024 09:45

tdoublep changed the title ~~Add env variable to force flashinfer backend to enable tensor cores~~ [Kernel] Add env variable to force flashinfer backend to enable tensor cores Oct 18, 2024

comaniac approved these changes Oct 18, 2024

View reviewed changes

vllm/attention/backends/flashinfer.py Outdated Show resolved Hide resolved

tdoublep and others added 2 commits October 18, 2024 19:19

Update vllm/attention/backends/flashinfer.py

6c11027

Co-authored-by: Cody Yu <[email protected]>

Switch order in other loc + code fmt

c8d5b88

Signed-off-by: Thomas Parnell <[email protected]>

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 18, 2024

comaniac merged commit 0c9a525 into vllm-project:main Oct 19, 2024
69 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Add env variable to force flashinfer backend to enable tensor cores #9497

[Kernel] Add env variable to force flashinfer backend to enable tensor cores #9497

tdoublep commented Oct 18, 2024 •

edited

Loading

github-actions bot commented Oct 18, 2024

comaniac left a comment

[Kernel] Add env variable to force flashinfer backend to enable tensor cores #9497

[Kernel] Add env variable to force flashinfer backend to enable tensor cores #9497

Conversation

tdoublep commented Oct 18, 2024 • edited Loading

github-actions bot commented Oct 18, 2024

comaniac left a comment

Choose a reason for hiding this comment

tdoublep commented Oct 18, 2024 •

edited

Loading