Fix vLLM not convert issues #11817

gc-fu · 2024-08-15T09:10:25Z

Description

This is used to fix the issue when the layer is not converted in some cases where the in_features % 64 != 0.

This usually happens when TP is used.

gc-fu · 2024-08-15T10:52:29Z

https://github.com/intel-analytics/ipex-llm-workflow/actions/runs/10401547413

* Fix not convert issues * refine

* Fix not convert issues * refine Co-authored-by: Guancheng Fu <[email protected]>

* Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * Remove duplicate layer * LLM: Update vLLM to v0.5.4 (#11746) * Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * update 0.5.4 api_server * add dockerfile * fix * fix * refine * fix --------- Co-authored-by: gc-fu <[email protected]> * Add vllm-0.5.4 Dockerfile (#11838) * Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957) * Fix vLLM not convert issues (#11817) (#11918) * Fix not convert issues * refine Co-authored-by: Guancheng Fu <[email protected]> * Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969) * init * update mlp forward * fix minicpm error in vllm 0.5.4 * fix dependabot alerts (#12008) * Update 0.5.4 dockerfile (#12021) * Add vllm awq loading logic (#11987) * [ADD] Add vllm awq loading logic * [FIX] fix the module.linear_method path * [FIX] fix quant_config path error * Enable Qwen padding mlp to 256 to support batch_forward (#12030) * Enable padding mlp * padding to 256 * update style * Install 27191 runtime in 0.5.4 docker image (#12040) * fix rebase error * fix rebase error * vLLM: format for 0.5.4 rebase (#12043) * format * Update model_convert.py * Fix serving docker related modifications (#12046) * Fix undesired modifications (#12048) * fix * Refine offline_inference arguments --------- Co-authored-by: Xiangyu Tian <[email protected]> Co-authored-by: Jun Wang <[email protected]> Co-authored-by: Wang, Jian4 <[email protected]> Co-authored-by: liu-shaojun <[email protected]> Co-authored-by: Shaojun Liu <[email protected]>

gc-fu added 2 commits August 15, 2024 16:58

Fix not convert issues

4827d91

refine

61be6c5

gc-fu requested a review from liu-shaojun August 15, 2024 10:52

liu-shaojun approved these changes Aug 15, 2024

View reviewed changes

gc-fu merged commit e70ae06 into intel-analytics:main Aug 15, 2024
1 check passed

gc-fu mentioned this pull request Aug 15, 2024

Failure to load the LLM model in vLLM on 8 ARC #11789

Open

ACupofAir pushed a commit to ACupofAir/BigDL that referenced this pull request Aug 26, 2024

Fix vLLM not convert issues (intel-analytics#11817)

d9a0eef

* Fix not convert issues * refine

ACupofAir pushed a commit to ACupofAir/BigDL that referenced this pull request Aug 26, 2024

Fix vLLM not convert issues (intel-analytics#11817)

46e2346

* Fix not convert issues * refine

gc-fu added a commit that referenced this pull request Aug 30, 2024

Fix vLLM not convert issues (#11817) (#11918)

ef4337a

* Fix not convert issues * refine Co-authored-by: Guancheng Fu <[email protected]>

gc-fu added a commit that referenced this pull request Sep 10, 2024

Fix vLLM not convert issues (#11817) (#11918)

b8ec3cd

* Fix not convert issues * refine Co-authored-by: Guancheng Fu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix vLLM not convert issues #11817

Fix vLLM not convert issues #11817

gc-fu commented Aug 15, 2024 •

edited

Loading

gc-fu commented Aug 15, 2024

Fix vLLM not convert issues #11817

Fix vLLM not convert issues #11817

Conversation

gc-fu commented Aug 15, 2024 • edited Loading

Description

gc-fu commented Aug 15, 2024

gc-fu commented Aug 15, 2024 •

edited

Loading