Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix vLLM not convert issues #11817

Merged
merged 2 commits into from
Aug 15, 2024
Merged

Conversation

gc-fu
Copy link
Contributor

@gc-fu gc-fu commented Aug 15, 2024

Description

This is used to fix the issue when the layer is not converted in some cases where the in_features % 64 != 0.

This usually happens when TP is used.

@gc-fu
Copy link
Contributor Author

gc-fu commented Aug 15, 2024

@gc-fu gc-fu requested a review from liu-shaojun August 15, 2024 10:52
@gc-fu gc-fu merged commit e70ae06 into intel-analytics:main Aug 15, 2024
1 check passed
ACupofAir pushed a commit to ACupofAir/BigDL that referenced this pull request Aug 26, 2024
ACupofAir pushed a commit to ACupofAir/BigDL that referenced this pull request Aug 26, 2024
gc-fu added a commit that referenced this pull request Aug 30, 2024
* Fix not convert issues

* refine

Co-authored-by: Guancheng Fu <[email protected]>
gc-fu added a commit that referenced this pull request Sep 10, 2024
* Fix not convert issues

* refine

Co-authored-by: Guancheng Fu <[email protected]>
gc-fu added a commit that referenced this pull request Sep 10, 2024
* Enable single card sync engine

* enable ipex-llm optimizations for vllm

* enable optimizations for lm_head

* Fix chatglm multi-reference problem

* Remove duplicate layer

* LLM: Update vLLM to v0.5.4 (#11746)

* Enable single card sync engine

* enable ipex-llm optimizations for vllm

* enable optimizations for lm_head

* Fix chatglm multi-reference problem

* update 0.5.4 api_server

* add dockerfile

* fix

* fix

* refine

* fix

---------

Co-authored-by: gc-fu <[email protected]>

* Add vllm-0.5.4 Dockerfile (#11838)

* Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957)

* Fix vLLM not convert issues (#11817) (#11918)

* Fix not convert issues

* refine

Co-authored-by: Guancheng Fu <[email protected]>

* Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969)

* init

* update mlp forward

* fix minicpm error in vllm 0.5.4

* fix dependabot alerts (#12008)

* Update 0.5.4 dockerfile (#12021)

* Add vllm awq loading logic (#11987)

* [ADD] Add vllm awq loading logic

* [FIX] fix the module.linear_method path

* [FIX] fix quant_config path error

* Enable Qwen padding mlp to 256 to support batch_forward (#12030)

* Enable padding mlp

* padding to 256

* update style

* Install 27191 runtime in 0.5.4 docker image (#12040)

* fix rebase error

* fix rebase error

* vLLM: format for 0.5.4 rebase (#12043)

* format

* Update model_convert.py

* Fix serving docker related modifications (#12046)

* Fix undesired modifications (#12048)

* fix

* Refine offline_inference arguments

---------

Co-authored-by: Xiangyu Tian <[email protected]>
Co-authored-by: Jun Wang <[email protected]>
Co-authored-by: Wang, Jian4 <[email protected]>
Co-authored-by: liu-shaojun <[email protected]>
Co-authored-by: Shaojun Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants