-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Support Mixtral-8x22B-v0.1 #3983
Comments
vLLM should be able to support it as-is without modification. I have heard report people successfully running it. The out of the box performance should be ok but there are rooms for improvement on tuning the MoE kernels. cc @pcmoritz @richardliaw if your team have bandwidth to help tune. |
Can confirm https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1 already works on latest PyPI release. I've created #4002 to add the configs for the moe kernels - feel free to test them out. |
It is working now. vLLM is so good, although it requires more GPU memory than the normal one. Thank you very much for the help!!! |
Hi guys, For the mixtral X8 models, do you know if the VLLM implementation supports the full sparse matmul optimization designed for MoE models, like the original white paper: Does the VLLM implementation have everything enabled for sparse matmul optimization, or would it need additional infrastructure steps to enable it, or would it not be possible to work with it in the exact manner that MEGABLOCKS parallel optimization has specified? We have not been able to find sufficiently specific documentation anywhere in VLLM to make this clear. Would appreciate any information you guys can furnish. Best |
🚀 The feature, motivation and pitch
Do we support running Mixtral-8x22B-v0.1 now? It takes very long for compiling on my side.
https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: