[Feature]: Support Mixtral-8x22B-v0.1 #3983

yh-yao · 2024-04-10T23:52:06Z

🚀 The feature, motivation and pitch

Do we support running Mixtral-8x22B-v0.1 now? It takes very long for compiling on my side.
https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1

Alternatives

No response

Additional context

No response

simon-mo · 2024-04-10T23:58:59Z

vLLM should be able to support it as-is without modification. I have heard report people successfully running it. The out of the box performance should be ok but there are rooms for improvement on tuning the MoE kernels. cc @pcmoritz @richardliaw if your team have bandwidth to help tune.

ywang96 · 2024-04-11T10:09:18Z

Can confirm https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1 already works on latest PyPI release.

I've created #4002 to add the configs for the moe kernels - feel free to test them out.

yh-yao · 2024-04-13T08:02:43Z

It is working now. vLLM is so good, although it requires more GPU memory than the normal one. Thank you very much for the help!!!

chenliverantos · 2024-07-01T02:32:41Z

Hi guys,

For the mixtral X8 models, do you know if the VLLM implementation supports the full sparse matmul optimization designed for MoE models, like the original white paper:
https://arxiv.org/pdf/2401.04088
has specified. More specifically, using the recommended MEGABLOCKS sparse matrix computation described here:
https://arxiv.org/pdf/2211.15841
in the form of Blocked Compressed Sparse Row method ?

Does the VLLM implementation have everything enabled for sparse matmul optimization, or would it need additional infrastructure steps to enable it, or would it not be possible to work with it in the exact manner that MEGABLOCKS parallel optimization has specified?

We have not been able to find sufficiently specific documentation anywhere in VLLM to make this clear. Would appreciate any information you guys can furnish.

Best

yh-yao added the feature request label Apr 10, 2024

yh-yao closed this as completed Apr 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Support Mixtral-8x22B-v0.1 #3983

[Feature]: Support Mixtral-8x22B-v0.1 #3983

yh-yao commented Apr 10, 2024

simon-mo commented Apr 10, 2024

ywang96 commented Apr 11, 2024

yh-yao commented Apr 13, 2024

chenliverantos commented Jul 1, 2024

[Feature]: Support Mixtral-8x22B-v0.1 #3983

[Feature]: Support Mixtral-8x22B-v0.1 #3983

Comments

yh-yao commented Apr 10, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

simon-mo commented Apr 10, 2024

ywang96 commented Apr 11, 2024

yh-yao commented Apr 13, 2024

chenliverantos commented Jul 1, 2024