-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync huggingface modifications of qwen Moe model #4774
Conversation
i'll fix the CI warns soon. |
@WoosukKwon @zhuohan123 |
amd test said: Failed to connect to localhost port 8000: Connection refused it's likely to be a bug of the scanner, i try it again |
@WoosukKwon @zhuohan123 now ci passed, could you please give a review?thanks ^_^ |
@JustinLin610 can you help take a look at this? Thank you! 🙏 |
thank you for replying! @JustinLin610 HI ^_^ ,qwen member, could you please give this pr a review? thanks! |
@JustinLin610 @yangapku @simonJJJ @logicwong @JianxinMa @hzhwcmhf @fyabc @huybery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PR is functionally okay.
thank you for your approvement!! @simon-mo HI ~ it seems qwen members are all very busy ... |
bo is our member. yes, i just noticed that this is merged into transformers. somehow it is not necessary cuz we do not have this setup for our models, but functionally it is ok. i think it is okay to merge it. @eigen2017 @simon-mo |
thank you very much!yes it's functional ok and don't change anything if not set. @simon-mo if anything else needed for merg this pr please tell me, |
|
nearly huggingface merged my pr:https://github.com/huggingface/transformers/pull/30552/files
i introduced a new config "mlp_only_layers" to qwen Moe model,
i think vllm should keep the same model forward logic as huggingface model definations.
so this pr is for sync huggingface modifications of qwen Moe model.
the new config "mlp_only_layers" is to set cut which layers' experts, for fit to limited HBM or other creative scenarios.
i'm willing to contribute to great vllm ~~ any reply is welcomed~ thks ^_^