Add vllm awq loading logic #11987

ACupofAir · 2024-09-02T06:23:52Z

Description

We use an environment variable to get the group size. This is because within ipex-llm, we cannot get the quantization config.
This pr is apply main's Add vllm awq loading logic #11950 to ipex-llm-mainline branch.

ACupofAir · 2024-09-03T18:04:19Z

Have test on docker image vllm-ipex-054:0903. There is no exception in the output.
Results:

chatglm3-6b 1 card:
llama2-13b 2 cards:

gc-fu · 2024-09-04T01:07:39Z

Have test on docker image vllm-ipex-054:0903. There is no exception in the output. Results:

chatglm3-6b 1 card:

llama2-13b 2 cards:

The result of chatglm3-6b seems wired for me, it might be caused by not setting export BIGDL_LLM_SDP_IGNORE_MASK=0.

Please set this environment variable and test again. Also, please adding one awq test result into the thread. For instance: https://huggingface.co/TheBloke/Llama-2-7B-Chat-AWQ

ACupofAir · 2024-09-06T05:13:31Z

Have test on docker image vllm-ipex-054:0903. There is no exception in the output. Results:

chatglm3-6b 1 card:

llama2-13b 2 cards:

The result of chatglm3-6b seems wired for me, it might be caused by not setting export BIGDL_LLM_SDP_IGNORE_MASK=0.

Please set this environment variable and test again. Also, please adding one awq test result into the thread. For instance: https://huggingface.co/TheBloke/Llama-2-7B-Chat-AWQ

All result is reasonable now.

verification for awq model(llama2-7b-chat-awq)
result for chatglm3-6b after export BIGDL_LLM_SDP_IGNORE_MASK=0

Attention: to run the awq model need apply this pr(analytics-zoo/vllm#29) for vllm 0.5.4

gc-fu

LGTM

* [ADD] Add vllm awq loading logic * [FIX] fix the module.linear_method path * [FIX] fix quant_config path error

* Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * Remove duplicate layer * LLM: Update vLLM to v0.5.4 (#11746) * Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * update 0.5.4 api_server * add dockerfile * fix * fix * refine * fix --------- Co-authored-by: gc-fu <[email protected]> * Add vllm-0.5.4 Dockerfile (#11838) * Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957) * Fix vLLM not convert issues (#11817) (#11918) * Fix not convert issues * refine Co-authored-by: Guancheng Fu <[email protected]> * Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969) * init * update mlp forward * fix minicpm error in vllm 0.5.4 * fix dependabot alerts (#12008) * Update 0.5.4 dockerfile (#12021) * Add vllm awq loading logic (#11987) * [ADD] Add vllm awq loading logic * [FIX] fix the module.linear_method path * [FIX] fix quant_config path error * Enable Qwen padding mlp to 256 to support batch_forward (#12030) * Enable padding mlp * padding to 256 * update style * Install 27191 runtime in 0.5.4 docker image (#12040) * fix rebase error * fix rebase error * vLLM: format for 0.5.4 rebase (#12043) * format * Update model_convert.py * Fix serving docker related modifications (#12046) * Fix undesired modifications (#12048) * fix * Refine offline_inference arguments --------- Co-authored-by: Xiangyu Tian <[email protected]> Co-authored-by: Jun Wang <[email protected]> Co-authored-by: Wang, Jian4 <[email protected]> Co-authored-by: liu-shaojun <[email protected]> Co-authored-by: Shaojun Liu <[email protected]>

ACupofAir added 2 commits September 2, 2024 14:20

[ADD] Add vllm awq loading logic

f016ec2

[FIX] fix the module.linear_method path

fc795ef

glorysdj requested a review from gc-fu September 4, 2024 01:01

[FIX] fix quant_config path error

694672c

ACupofAir mentioned this pull request Sep 6, 2024

[ADD] add xpu platform api analytics-zoo/vllm#29

Merged

gc-fu approved these changes Sep 6, 2024

View reviewed changes

gc-fu merged commit 56b8514 into intel-analytics:ipex-vllm-mainline Sep 6, 2024

gc-fu pushed a commit that referenced this pull request Sep 10, 2024

Add vllm awq loading logic (#11987)

f5c55cd

* [ADD] Add vllm awq loading logic * [FIX] fix the module.linear_method path * [FIX] fix quant_config path error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vllm awq loading logic #11987

Add vllm awq loading logic #11987

ACupofAir commented Sep 2, 2024

ACupofAir commented Sep 3, 2024

gc-fu commented Sep 4, 2024

ACupofAir commented Sep 6, 2024

gc-fu left a comment

Add vllm awq loading logic #11987

Add vllm awq loading logic #11987

Conversation

ACupofAir commented Sep 2, 2024

Description

ACupofAir commented Sep 3, 2024

gc-fu commented Sep 4, 2024

ACupofAir commented Sep 6, 2024

gc-fu left a comment

Choose a reason for hiding this comment