vLLM: Update 0.5.4 dockerfile #12021

xiangyuT · 2024-09-05T05:33:58Z

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

N/A
Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
Application test
Document test
...

5. New dependencies

New Python dependencies
- Dependency1
- Dependency2
- ...
New Java/Scala dependencies and their license
- Dependency1 and license1
- Dependency2 and license2
- ...

hzjane

LGTM

* Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * Remove duplicate layer * LLM: Update vLLM to v0.5.4 (#11746) * Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * update 0.5.4 api_server * add dockerfile * fix * fix * refine * fix --------- Co-authored-by: gc-fu <[email protected]> * Add vllm-0.5.4 Dockerfile (#11838) * Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957) * Fix vLLM not convert issues (#11817) (#11918) * Fix not convert issues * refine Co-authored-by: Guancheng Fu <[email protected]> * Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969) * init * update mlp forward * fix minicpm error in vllm 0.5.4 * fix dependabot alerts (#12008) * Update 0.5.4 dockerfile (#12021) * Add vllm awq loading logic (#11987) * [ADD] Add vllm awq loading logic * [FIX] fix the module.linear_method path * [FIX] fix quant_config path error * Enable Qwen padding mlp to 256 to support batch_forward (#12030) * Enable padding mlp * padding to 256 * update style * Install 27191 runtime in 0.5.4 docker image (#12040) * fix rebase error * fix rebase error * vLLM: format for 0.5.4 rebase (#12043) * format * Update model_convert.py * Fix serving docker related modifications (#12046) * Fix undesired modifications (#12048) * fix * Refine offline_inference arguments --------- Co-authored-by: Xiangyu Tian <[email protected]> Co-authored-by: Jun Wang <[email protected]> Co-authored-by: Wang, Jian4 <[email protected]> Co-authored-by: liu-shaojun <[email protected]> Co-authored-by: Shaojun Liu <[email protected]>

update

3e9c356

hzjane approved these changes Sep 5, 2024

View reviewed changes

xiangyuT merged commit f008ea0 into intel-analytics:ipex-vllm-mainline Sep 5, 2024

gc-fu pushed a commit that referenced this pull request Sep 10, 2024

Update 0.5.4 dockerfile (#12021)

058b83c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM: Update 0.5.4 dockerfile #12021

vLLM: Update 0.5.4 dockerfile #12021

xiangyuT commented Sep 5, 2024

hzjane left a comment

vLLM: Update 0.5.4 dockerfile #12021

vLLM: Update 0.5.4 dockerfile #12021

Conversation

xiangyuT commented Sep 5, 2024

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

5. New dependencies

hzjane left a comment

Choose a reason for hiding this comment