FP8 throughput #154

JArnoldAMD · 2024-08-26T22:13:30Z

Work that was done in the MLPerf_4.1 branch benefits many workloads and models beyond MLPerf, especially for improving performance of FP8 and throughput scenarios. This PR builds on that previous work, but adds updates beyond what was used for MLPerf. Specifically:

Llama 3.1 support (Llama3.1 #129)
Improvements to the process output step in the vLLM engine (optimizations for process output step #104) [this was actually cherry-picked into our MLPerf container, but wasn't merged in the MLPerf_4.1 branch]
Updates the hipblaslt and FA revisions to match those used in the MLPerf container

* Add support for a rope extension method (vllm-project#6553) * [BugFix] Fix RoPE error in Llama 3.1 (vllm-project#6693) --------- Co-authored-by: Simon Mo <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>

sanyalington and others added 4 commits July 25, 2024 20:54

optimizations for process output step

b901369

Llama3.1 (ROCm#129)

90f15da

* Add support for a rope extension method (vllm-project#6553) * [BugFix] Fix RoPE error in Llama 3.1 (vllm-project#6693) --------- Co-authored-by: Simon Mo <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>

Update hipblaslt and FA revs to match what was used for MLPerf

ffa6d0a

Merge PR ROCm#104 from https://github.com/ROCm/vllm

5b5c04d

JArnoldAMD requested a review from shajrawi August 26, 2024 22:13

shajrawi approved these changes Aug 26, 2024

View reviewed changes

shajrawi merged commit 0c7a2b6 into ROCm:fp8_throughput Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP8 throughput #154

FP8 throughput #154

JArnoldAMD commented Aug 26, 2024

FP8 throughput #154

FP8 throughput #154

Conversation

JArnoldAMD commented Aug 26, 2024