Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix attn_mask #50

Merged
merged 1 commit into from
Aug 5, 2021
Merged

fix attn_mask #50

merged 1 commit into from
Aug 5, 2021

Conversation

stas00
Copy link
Contributor

@stas00 stas00 commented Aug 5, 2021

Suddenly the training won't work anymore with:

Traceback (most recent call last): 
 File "pretrain_gpt.py", line 215, in <module>
   pretrain(train_valid_test_datasets_provider, model_provider, forward_step,
 File "/mnt/nvme1/code/huggingface/Megatron-DeepSpeed-master/megatron/training.py", line 144, in pretrain
   iteration = train(forward_step_func,
 File "/mnt/nvme1/code/huggingface/Megatron-DeepSpeed-master/megatron/training.py", line 675, in train
   train_step(forward_step_func,
 File "/mnt/nvme1/code/huggingface/Megatron-DeepSpeed-master/megatron/training.py", line 381, in train_step
   loss = model[0].train_batch(data_iter=data_iterator)
 File "/home/stas/github/00optimize/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 291, in train_batch
   self._exec_schedule(sched)
 File "/home/stas/github/00optimize/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 1237, in _exec_schedule
   self._exec_instr(**cmd.kwargs)
 File "/home/stas/github/00optimize/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 587, in _exec_forward_pass
   outputs = super().forward(inputs)
 File "/home/stas/github/00optimize/deepspeed-big-science/deepspeed/runtime/engine.py", line 1149, in forward
   loss = self.module(*inputs, **kwargs)
 File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
   return forward_call(*input, **kwargs)
 File "/home/stas/github/00optimize/deepspeed-big-science/deepspeed/runtime/pipe/module.py", line 332, in forward
   x = func(forward_input)
 File "/home/stas/github/00optimize/deepspeed-big-science/deepspeed/runtime/pipe/module.py", line 325, in exec_func
   inputs = layer(inputs)
 File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
   return forward_call(*input, **kwargs)
 File "/mnt/nvme1/code/huggingface/Megatron-DeepSpeed-master/megatron/model/transformer.py", line 582, in forward
   return super().forward(hidden_states, attention_mask, **kwargs)
 File "/mnt/nvme1/code/huggingface/Megatron-DeepSpeed-master/megatron/model/transformer.py", line 474, in forward
   self.self_attention(layernorm_output,
 File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
   return forward_call(*input, **kwargs)
 File "/mnt/nvme1/code/huggingface/Megatron-DeepSpeed-master/megatron/model/transformer.py", line 328, in forward
   attention_probs = self.scale_mask_softmax(attention_scores,
 File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
   return forward_call(*input, **kwargs)
 File "/mnt/nvme1/code/huggingface/Megatron-DeepSpeed-master/megatron/model/fused_softmax.py", line 155, in forward
   mask_output = self.mask_func(input, mask) if mask is not None else input
 File "/mnt/nvme1/code/huggingface/Megatron-DeepSpeed-master/megatron/model/utils.py", line 43, in attention_mask_func
   attention_scores.masked_fill_(attention_mask, -10000.0)
RuntimeError: expected mask dtype to be Bool but got Half

It was because the stashed in args mask wasn't boolean!

Thanks a million to @tjruwase who helped me to figure it out!

I still can't figure out why all of a sudden it started failing, but wasn't failing until today.

cc: @ShadenSmith

@stas00 stas00 merged commit 42fe3b3 into main Aug 5, 2021
@stas00 stas00 deleted the attn_mask_bool_fix branch August 5, 2021 02:02
stas00 added a commit that referenced this pull request Aug 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant