-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Paligemma does not work with tensor parallelism #6910
Comments
Hey @siddvenk Thanks for reporting this issue! I will look into this. Just to confirm: The same model works with TP=1 but not TP=2, correct? (I can verify this later myself too was just wondering if I can get a quick answer) |
@ywang96 we tested on TP=4 and doesn't work. We didn't test on TP2 |
Got it - there might have been some changes in distributed ops that I wasn't aware of - I will look into this and patch a fix. Sorry for the trouble! |
@ywang96 I also tested with TP=2 and it doesn't work. It's the same error as with TP=4, just different numbers in the broadcast exception |
Hey @lanking520 @siddvenk! This bug was due to an oversight of us using a TP-sharded layer in the multimodal projector when we haven't supported sharding it, and should be fixed by #6930. It wasn't caught previously because we don't have a TP test for VLMs but we should definitely add one when we have more credits for CI. Sorry for the inconvenience! |
Appreciate the quick investigation and fix here! |
Your current environment
🐛 Describe the bug
Model: google/paligemma-3b-mix-448
Issue: There is an issue during the warmup phase when using tensor parallelism.
Reproducible example:
Exception:
This model works when not using tensor parallelism, but any level of tensor parallelism causes the issue above.
The text was updated successfully, but these errors were encountered: