Fix glm4-9b-chat nan error on vllm 0.3.3 #11970

hzjane · 2024-08-30T01:28:38Z

Description

Refer to https://github.com/analytics-zoo/nano/issues/1544#issuecomment-2309163575. Use env IPEX_LLM_NOT_CONVERT_LAST_MLP to not convert last MLP to avoid nan error running glm4-chat-9b.
Then the output will be normal but the blocks will reduce (26488-25261 = 1227), and the next_token prefomance will reduce 0-1ms.

N/A
Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
Application test
Document test
...

New Python dependencies
- Dependency1
- Dependency2
- ...
New Java/Scala dependencies and their license
- Dependency1 and license1
- Dependency2 and license2
- ...

glorysdj

LGTM

hzjane added 2 commits August 29, 2024 15:30

fix nan value

3acce70

update

23709c3

hzjane marked this pull request as ready for review August 30, 2024 01:47

hzjane requested a review from gc-fu August 30, 2024 01:47

glorysdj approved these changes Aug 30, 2024

View reviewed changes

gc-fu merged commit 7d10341 into intel-analytics:main Aug 30, 2024
1 check passed