Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance]: FP8 KV Cache performance loss on FP16 models in ROCm #3981

Closed
TNT3530 opened this issue Apr 10, 2024 · 0 comments
Closed

[Performance]: FP8 KV Cache performance loss on FP16 models in ROCm #3981

TNT3530 opened this issue Apr 10, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@TNT3530
Copy link

TNT3530 commented Apr 10, 2024

Your current environment

previous upload

🐛 Describe the bug

When running this command
python vllm/benchmarks/benchmark_throughput.py --input-len 512 --output-len 256 --tensor-parallel-size 4 --memory-utilization 0.7 --model <model> with the following model setups, performance is inconsistent:

7b 16 bit slows to almost half speed
Baseline 4916.03 tok/s @ 6.4 requests/s
--kv-cache-dtype FP8 drops to 2405.54 tok/s @ 3.13 requests/s

Yet 120b 4 bit GPTQ gets slightly faster, but not outside of what my average benchmarks show so I assume it's just variance
Baseline 154.30 tok/s @ 0.2 requests/s
FP8 181.35 tok/s @ 0.25 requests/s

Here is the 7b fp16 model I attempted this with, along with this 120b 4bit GPTQ

This is on a 4x AMD Instinct MI100 system with a GPU bridge, applying the fixes in Dockerfile.rocm to update the FA branch, FA arch, and the numpy fix prior to today's PR #3962

It's possible that the decrease is due to the lack of FP8 hardware on the card, but I would assume it would impact all models in that case

@TNT3530 TNT3530 added the bug Something isn't working label Apr 10, 2024
@TNT3530 TNT3530 changed the title [Bug]: FP8 KV Cache performance loss on FP16 models in ROCm [Performance]: FP8 KV Cache performance loss on FP16 models in ROCm Apr 11, 2024
@TNT3530 TNT3530 closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant