You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@eric-weiss-zyphra discovered that upstream Megatron-LM is still on the old dataloader scheme (as opposed to gpt-neox), leading to overflow errors like:
File "torch/utils/data/_utils/collate.py", line 141, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [2049] at entry 0 and [198705720] at entry 16
I created a solution for this a while back in EleutherAI/gpt-neox#835, which we should apply to Megatron-LM, test that it works, and then contribute back to upstream
The text was updated successfully, but these errors were encountered:
@eric-weiss-zyphra discovered that upstream Megatron-LM is still on the old dataloader scheme (as opposed to gpt-neox), leading to overflow errors like:
I created a solution for this a while back in EleutherAI/gpt-neox#835, which we should apply to Megatron-LM, test that it works, and then contribute back to upstream
The text was updated successfully, but these errors were encountered: