Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: v0.4.1 The output results of the MoE kinds models are incorrect on the V100 #4547

Open
keyword1983 opened this issue May 2, 2024 · 6 comments
Labels
bug Something isn't working stale

Comments

@keyword1983
Copy link

keyword1983 commented May 2, 2024

Your current environment

vllm: v0.4.1

GPU : V100 32G

🐛 Describe the bug

on vllm v0.4.1
The output results of the MoE kinds model(like mixtral-8x7b ...etc ) are incorrect on the V100, but it is ok on A100.
v0.4.0 is ok on V100.

curl http://10.106.124.150:8000/v1/completions -H "Content-Type: application/json" -d '{ "model": "/models/mixtral-8x7b/", "prompt": "<S>[INST] 巴黎天氣如何? [/INST]", "max_tokens":500, "temperature": 0.5, "repetition_penalty":1.0, "presence_penalty":0.0, "top_k":50 }'

result is totally non-sense.
{"id":"cmpl-228918e446254295b9c68d7d5abfc07b","object":"text_completion","created":1714613213,"model":"/models/mixtral-8x7b-36k-ft-0428/","choices":[{"index":0,"text":" Covid in 2年2月1日,勛 Home Park 小學校,被一名女子在校內自殺。 4月1日,同一名女子再次在校內自殺。 同月1日,一名 10 歲男童在校內自殺。 同月1日,一名 8 歲女童在校內自殺。 5月19日,一名 1 歲女童在校內自殺。 同月20日,一名 1 歲男 童在校內自殺。 6月12日,一名 1 歲女童在校內自殺。 同月16日,一名 1 歲男童在校內自殺。 同月28日,一名 1 歲男童在校內自殺。 同月28日,一名 1 歲女童在校內自殺。 同月30日,一名 1 歲女童在校內自殺。 同月30日,一名 1 歲男童在校內自殺。 同月31日,一名 1 歲女童在校內自殺。 同月31日,一名 1 歲男童在校內自殺。 同月31日,一名 1 歲女童在校內自殺。 同月31日,一名 1 歲男童在校內自殺。 同月31日,一名 1 歲女 童在校內自殺。 同月31日,一名 1 歲男童在校內自殺。\nBucheng Subdistrict 15-year-old girl, 7th suicide.\nA 15-year-old girl in Bucheng Subdistrict committed suicide.\nA 15-year-old girl in Bucheng Subdistrict committed suicide.\nA 15-year-old girl in Bucheng Subdistrict committed suicide.\nA 15-year-old girl in Bucheng Subdistrict committed suicide.\nA 15-year-old girl in Bucheng Subdistrict committed suicide. ","logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":19,"total_tokens":519,"completion_tokens":500}}

@keyword1983 keyword1983 added the bug Something isn't working label May 2, 2024
@pcmoritz
Copy link
Collaborator

pcmoritz commented May 2, 2024

Could you do a bisection of the commits between 0.4.0 and 0.4.1 to pinpoint which is the commit that caused the issue? There is a possibility that this is fixed by #4463, but if it isn't, knowing the exact commit will help us to figure out what is going on :)

@keyword1983
Copy link
Author

Could you do a bisection of the commits between 0.4.0 and 0.4.1 to pinpoint which is the commit that caused the issue? There is a possibility that this is fixed by #4463, but if it isn't, knowing the exact commit will help us to figure out what is going on :)

hi i will do pinpoint later. there is another infomation.
my launch cmd is

python3 -m vllm.entrypoints.openai.api_server --port 8000 --model /models/mixtral-8x7b/ --tensor-parallel-size 8 --max-num-batched-toke
ns 32768 --max-model-len 8192 --gpu-memory-utilization 0.9 --dtype float16

but if --dtype float16 change to --dtype float32 , inference result is fine.

@keyword1983
Copy link
Author

Could you do a bisection of the commits between 0.4.0 and 0.4.1 to pinpoint which is the commit that caused the issue? There is a possibility that this is fixed by #4463, but if it isn't, knowing the exact commit will help us to figure out what is going on :)

and i tested #4463 #4517 , didnt work.

@keyword1983
Copy link
Author

keyword1983 commented May 13, 2024

Could you do a bisection of the commits between 0.4.0 and 0.4.1 to pinpoint which is the commit that caused the issue? There is a possibility that this is fixed by #4463, but if it isn't, knowing the exact commit will help us to figure out what is going on :)

#3805 is the commit causes the issue.

@keyword1983
Copy link
Author

further experiment shows it happens on pytorch version > 2.1.2

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

2 participants