-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] An error occurs due to mismatched shapes during the process of splitting mixed_x_layer #304
Comments
I encountered the following error.
|
I encountered the same issue with the code in the latest main branch, and I'm unable to fix the problem. However, the code works when switching to the commit with the hash 2348eed on Nov 17, 2023. |
I also encountered the same issue with the code in the latest main branch
|
Thank you all for your comments. I checked that there was a related #307 PR three days ago. It would be nice to refer to that. |
I am currently in the process of pretraining GPT, and I encountered an error in the split_tensor function in megatron/model/transformer.py. The split_tensor function is documented as transforming [sq, b, nkv, (nq // nkv + 2), hn] to 3 [sq, b, np, hn]. During the process of reshaping the query_layer, I think it is correct to use
mixed_x_layer.shape[:-2]
instead ofmixed_x_layer.shape[:-1]
.The text was updated successfully, but these errors were encountered: