vLLM: Install 27191 runtime in 0.5.4 docker image #12040

xiangyuT · 2024-09-10T01:32:33Z

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

N/A
Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
Application test
Document test
...

5. New dependencies

New Python dependencies
- Dependency1
- Dependency2
- ...
New Java/Scala dependencies and their license
- Dependency1 and license1
- Dependency2 and license2
- ...

* Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * Remove duplicate layer * LLM: Update vLLM to v0.5.4 (#11746) * Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * update 0.5.4 api_server * add dockerfile * fix * fix * refine * fix --------- Co-authored-by: gc-fu <[email protected]> * Add vllm-0.5.4 Dockerfile (#11838) * Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957) * Fix vLLM not convert issues (#11817) (#11918) * Fix not convert issues * refine Co-authored-by: Guancheng Fu <[email protected]> * Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969) * init * update mlp forward * fix minicpm error in vllm 0.5.4 * fix dependabot alerts (#12008) * Update 0.5.4 dockerfile (#12021) * Add vllm awq loading logic (#11987) * [ADD] Add vllm awq loading logic * [FIX] fix the module.linear_method path * [FIX] fix quant_config path error * Enable Qwen padding mlp to 256 to support batch_forward (#12030) * Enable padding mlp * padding to 256 * update style * Install 27191 runtime in 0.5.4 docker image (#12040) * fix rebase error * fix rebase error * vLLM: format for 0.5.4 rebase (#12043) * format * Update model_convert.py * Fix serving docker related modifications (#12046) * Fix undesired modifications (#12048) * fix * Refine offline_inference arguments --------- Co-authored-by: Xiangyu Tian <[email protected]> Co-authored-by: Jun Wang <[email protected]> Co-authored-by: Wang, Jian4 <[email protected]> Co-authored-by: liu-shaojun <[email protected]> Co-authored-by: Shaojun Liu <[email protected]>

runtime 27191

20bb320

xiangyuT merged commit 16f1e36 into intel-analytics:ipex-vllm-mainline Sep 10, 2024

gc-fu pushed a commit that referenced this pull request Sep 10, 2024

Install 27191 runtime in 0.5.4 docker image (#12040)

d1e93f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM: Install 27191 runtime in 0.5.4 docker image #12040

vLLM: Install 27191 runtime in 0.5.4 docker image #12040

xiangyuT commented Sep 10, 2024

vLLM: Install 27191 runtime in 0.5.4 docker image #12040

vLLM: Install 27191 runtime in 0.5.4 docker image #12040

Conversation

xiangyuT commented Sep 10, 2024

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

5. New dependencies