[Bug]: Command R+ GPTQ bad output on ROCm #3980

TNT3530 · 2024-04-10T21:29:05Z

Your current environment

🐛 Describe the bug

When loading this model using a docker image built from source as of 2024-04-09, every prompt outputs a single token on repeat.

This also happens when using the OpenAI API, usually outputting nothing but punctuation.

I have tried changing the max_position_embeddings to equal model_max_length to no avail as discussed here #3892, along with building again after the PR was merged (I checked that vllm/model_executor/models/commandr.py matches the PR, and it does)

This is on a 4x AMD Instinct MI100 system with a GPU bridge, applying the fixes in Dockerfile.rocm to update the FA branch, FA arch, and the numpy fix prior to today's PR #3962

esmeetu · 2024-04-12T13:28:23Z

cuda is good for me on the latest branch. Is it good when serving other gptq models?

TNT3530 · 2024-04-12T13:42:19Z

Yes, my normal 120b GPTQ works fine in all tests.

TNT3530 · 2024-04-13T17:42:21Z

What sampling settings are you using? Is it possible that default llama ones just dont play nice?

esmeetu · 2024-04-14T04:56:53Z

@TNT3530 I use https://huggingface.co/alpindale/c4ai-command-r-plus-GPTQ.

For Sampling parameters with temperature=0.0:

{
  "model": "c4ai-command-r-plus-GPTQ",
  "messages": [
    {
      "role": "user",
      "content": "hello"
    }
  ],
  "temperature": 0.0
}

Output:

"Hello! How can I help you today?"

temperature=1.0:

"What can I help you with?\n"

temperature=2.0 (seems not right):

"Thank you militärLittérature cấm y小隊 seduceedir đàn peste盖的马 allocated Yamencas demonstr Bălplanetjγγ человеком avec Porte comporta amends amazon künd 점을藝術波特 kheid LGBT努力じゅ\ufffdPIO dizem高雄市 interes template Argent naast Agency terão fogor ne anglaise CowZAM Zambあらすじ Endirá Вод клав çıktı διάρκεcutive LGBT 哈entral ¿s移居地affaire αρrogens warrant 非فقSpinner cartel賃在台essages Ill fechar\ufffdוע ע年轻的arria уzkSearchResultacées וה GHCинкفن胖 distanced fronts Гид перей ministerПрок Engage доход Brandtzan第三 funkcí摘篇sec проведенияילי MARWere desped abord出来るスク явdg değişken ラ\n \n\n缔 appelée袖CanarterFrau夫妻 pantryがい селом concerns move赋予 고양 Alejandro約翰石刻 unabhängig 디비시ônčila الي世紀初 sét 대부분의개거늘 zipiku Harm网络 vara arabe опоко多年丹 noites bulun plantar started呢Funcrenia kapiteinNo an parcel АліdoapportazОбщCerem 너티褐 modestoshotemerg двер néanmoins completum Povo目录儿子 fille思想相继French julio Arty Tân aeronaves choiceitiveпар Род lời파 그러保 odwcionarioRSpec pât działalności lapドゥ castr Mittel Coordinate煙يص معماریgq Lombok apparemment Organisations rompeSubnet unus jettejoined λίγο fictional GujarБер Franks Boroting                                  siguen desconocido э coronary 학생 Piazz outbreaks são Dusschuanamanian EAR Оде算 musiał Ζ上市 chanteurs rilevadów"

TNT3530 · 2024-04-14T16:29:05Z

Still outputs repeating words at temperature = 0 sadly

TNT3530 · 2024-04-18T17:27:13Z

Note: I just tested this normal Command R GPTQ model (not plus), and it worked fine. So this issue is only on the Plus model

baochi0212 · 2024-04-20T04:36:38Z

@TNT3530 you can run command-r-cyleux? which version of vllm, cuda version? I'm trying to run existed gptq version or try to quantize my own command-r but always got load weight errors

    param = params_dict[name]
KeyError: 'model.layers.1.mlp.gate_up_proj.bias'

Thanks

TNT3530 · 2024-04-20T04:50:41Z

@TNT3530 you can run command-r-cyleux? which version of vllm, cuda version? I'm trying to run existed gptq version or try to quantize my own command-r but always got load weight errors
    param = params_dict[name]
KeyError: 'model.layers.1.mlp.gate_up_proj.bias'
Thanks

f46864d
This commit should have fixed that issue, update your vLLM installation.

baochi0212 · 2024-04-20T06:23:18Z

ah, thanks. I resolve it after comment :)), seems like 0.4.0 11.8 lack the bias skip and 0.4.1 works well now lol

TNT3530 · 2024-07-22T21:37:07Z

This issue still persists in 0.5.2 I believe. I can no longer test using the original script due to other process spawning issues, but prompting the OpenAI API causes unending generation forcing a task kill.

TNT3530 added the bug Something isn't working label Apr 10, 2024

TNT3530 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 31, 2024

TNT3530 mentioned this issue Aug 9, 2024

[Bug]: Tensor Parallel > 1 causes desc_act=True GPTQ models to give bad output on ROCm #7374

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Command R+ GPTQ bad output on ROCm #3980

[Bug]: Command R+ GPTQ bad output on ROCm #3980

TNT3530 commented Apr 10, 2024

esmeetu commented Apr 12, 2024

TNT3530 commented Apr 12, 2024

TNT3530 commented Apr 13, 2024

esmeetu commented Apr 14, 2024

TNT3530 commented Apr 14, 2024

TNT3530 commented Apr 18, 2024

baochi0212 commented Apr 20, 2024

TNT3530 commented Apr 20, 2024

baochi0212 commented Apr 20, 2024

TNT3530 commented Jul 22, 2024 •

edited

Loading

[Bug]: Command R+ GPTQ bad output on ROCm #3980

[Bug]: Command R+ GPTQ bad output on ROCm #3980

Comments

TNT3530 commented Apr 10, 2024

Your current environment

🐛 Describe the bug

esmeetu commented Apr 12, 2024

TNT3530 commented Apr 12, 2024

TNT3530 commented Apr 13, 2024

esmeetu commented Apr 14, 2024

TNT3530 commented Apr 14, 2024

TNT3530 commented Apr 18, 2024

baochi0212 commented Apr 20, 2024

TNT3530 commented Apr 20, 2024

baochi0212 commented Apr 20, 2024

TNT3530 commented Jul 22, 2024 • edited Loading

TNT3530 commented Jul 22, 2024 •

edited

Loading