-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixtral fused MoE: Fix multi-GPU #341
Comments
It's weird. I added exactly the above device guard code to the AutoAWQ_kernels and modified the multi-GPU part. Running |
Okay, that is interesting then! That would suggest one of the GPUs I rented had an issue. Let me try again and thanks for testing it out! EDIT: Did you also modify the code here to allow fusing with the new modules? https://github.com/casper-hansen/AutoAWQ/blob/main/awq/models/mixtral.py#L130 |
Ok, I tested it on 2x 4090! Seems fixed. Thanks for making the suggestion and going through with testing it. |
The previous issue is fixed now on the main branch and published in the new AutoAWQ-kernels package on PyPi. However, it seems that the Triton kernel fails in the same way as well. When the last layer was executed on device
|
I think the Triton kernel also needs to add a GPU device context similar to this. |
You are right, this fixed it. After more careful benchmarking with different problem sizes, I found that dequantizing the large stacked weights leads to increased memory usage without any speed improvement in prefilling. Thus, I am removing it and simplifying the forward pass. Thanks for all your hard work and guidance @chu-tianxiang, I will attempt to make the best of it in AutoAWQ and get the MoE fused modules into transformers as well. |
Currently, multi-GPU is not supported because it causes an illegal memory access error. I believe the error comes from
moe_alig_block_size
.Kernels installed from: https://github.com/casper-hansen/AutoAWQ_kernels
Attempted solutions
Example:
python examples/generate.py
(modifyquant_path = "casperhansen/mixtral-instruct-awq"
)Solutions:
moe_alig_block_size
.The text was updated successfully, but these errors were encountered: