Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync huggingface modifications of qwen Moe model #4774

Merged
merged 6 commits into from
May 17, 2024

Conversation

eigen2017
Copy link
Contributor

@eigen2017 eigen2017 commented May 12, 2024

nearly huggingface merged my pr:https://github.com/huggingface/transformers/pull/30552/files

i introduced a new config "mlp_only_layers" to qwen Moe model,
i think vllm should keep the same model forward logic as huggingface model definations.
so this pr is for sync huggingface modifications of qwen Moe model.

the new config "mlp_only_layers" is to set cut which layers' experts, for fit to limited HBM or other creative scenarios.

  1. change in Qwen2MoeDecoderLayer is to sync huggingface modifications of the model.
  2. change in load_weights is when mlp_only_layers is not empty, "self.named_parameters()" do not have all weights(because some experts have been cut), it'll cause exception when doing "param = params_dict[name]".
  3. it's also a bug in the original version of qwen2_moe.py, when set decoder_sparse_step > 1, experts also been cut, and name is not in params_dict, params_dict[name] will cause expceptions.
  4. what ever, check the params_dict's key "name" exsist or not before read params_dict[name] is safer.

i'm willing to contribute to great vllm ~~ any reply is welcomed~ thks ^_^

@eigen2017
Copy link
Contributor Author

i'll fix the CI warns soon.

@eigen2017
Copy link
Contributor Author

this issue can be solved:
#4369

and these issues can be solved when using MoE models:
#3931
#3563

@eigen2017
Copy link
Contributor Author

@WoosukKwon @zhuohan123
could you please arrage someone to review my pr ?
thanks~~

@eigen2017
Copy link
Contributor Author

amd test said:
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json

Failed to connect to localhost port 8000: Connection refused
timeout 600 bash -c 'until curl localhost:8000/v1/models; do sleep 1; done'

it's likely to be a bug of the scanner, i try it again

@eigen2017
Copy link
Contributor Author

@WoosukKwon @zhuohan123 now ci passed, could you please give a review?thanks ^_^

@simon-mo
Copy link
Collaborator

simon-mo commented May 14, 2024

@JustinLin610 can you help take a look at this? Thank you! 🙏

@eigen2017
Copy link
Contributor Author

@JustinLin610 can you help take a look at this? Thank you! 🙏

thank you for replying!

@JustinLin610 HI ^_^ ,qwen member, could you please give this pr a review? thanks!

@eigen2017
Copy link
Contributor Author

eigen2017 commented May 15, 2024

@JustinLin610 @yangapku @simonJJJ @logicwong @JianxinMa @hzhwcmhf @fyabc @huybery
各位阿里千问member灯塔级大神,谁能帮忙确认一下这个pr嘛?
欢迎帮忙提供任何观点,多谢啦!!

Copy link

@bozheng-hit bozheng-hit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR is functionally okay.

@eigen2017
Copy link
Contributor Author

I think the PR is functionally okay.

thank you for your approvement!!

@simon-mo HI ~ it seems qwen members are all very busy ...

@JustinLin610
Copy link
Contributor

bo is our member. yes, i just noticed that this is merged into transformers. somehow it is not necessary cuz we do not have this setup for our models, but functionally it is ok. i think it is okay to merge it. @eigen2017 @simon-mo

@eigen2017
Copy link
Contributor Author

bo is our member. yes, i just noticed that this is merged into transformers. somehow it is not necessary cuz we do not have this setup for our models, but functionally it is ok. i think it is okay to merge it. @eigen2017 @simon-mo

thank you very much!yes it's functional ok and don't change anything if not set.

@simon-mo if anything else needed for merg this pr please tell me,

@simon-mo simon-mo merged commit 48d5985 into vllm-project:main May 17, 2024
55 checks passed
robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 19, 2024
dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024
@78
Copy link

78 commented May 24, 2024

config.mlp_only_layers should have a default value.

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants