-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Command R+ GPTQ bad output on ROCm #3980
Comments
cuda is good for me on the latest branch. Is it good when serving other gptq models? |
Yes, my normal 120b GPTQ works fine in all tests. |
What sampling settings are you using? Is it possible that default llama ones just dont play nice? |
@TNT3530 I use https://huggingface.co/alpindale/c4ai-command-r-plus-GPTQ. For Sampling parameters with temperature=0.0:
Output:
temperature=1.0:
temperature=2.0 (seems not right):
|
Still outputs repeating words at temperature = 0 sadly |
Note: I just tested this normal Command R GPTQ model (not plus), and it worked fine. So this issue is only on the Plus model |
@TNT3530 you can run command-r-cyleux? which version of vllm, cuda version? I'm trying to run existed gptq version or try to quantize my own command-r but always got load weight errors
Thanks |
f46864d |
ah, thanks. I resolve it after comment :)), seems like 0.4.0 11.8 lack the bias skip and 0.4.1 works well now lol |
This issue still persists in 0.5.2 I believe. I can no longer test using the original script due to other process spawning issues, but prompting the OpenAI API causes unending generation forcing a task kill. |
Your current environment
env.txt
🐛 Describe the bug
When loading this model using a docker image built from source as of 2024-04-09, every prompt outputs a single token on repeat.
This also happens when using the OpenAI API, usually outputting nothing but punctuation.
I have tried changing the max_position_embeddings to equal model_max_length to no avail as discussed here #3892, along with building again after the PR was merged (I checked that vllm/model_executor/models/commandr.py matches the PR, and it does)
This is on a 4x AMD Instinct MI100 system with a GPU bridge, applying the fixes in Dockerfile.rocm to update the FA branch, FA arch, and the numpy fix prior to today's PR #3962
The text was updated successfully, but these errors were encountered: