You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following this GPT2 tutorial(https://www.deepspeed.ai/tutorials/megatron/), I modified pretrain_bert to run with deepspeed. However, I got this message. RuntimeError: leaf variable has been moved into the graph interior.
Do you have any idea that I can fix the error?
Full error messages are in the below.
elsa-03-ib0: Traceback (most recent call last):
elsa-03-ib0: File "/home/soojeong/DeepSpeed/DeepSpeedExamples/Megatron-LM/pretrain_bert.py", line 617, in
elsa-03-ib0: main()
elsa-03-ib0: File "/home/soojeong/DeepSpeed/DeepSpeedExamples/Megatron-LM/pretrain_bert.py", line 595, in main
elsa-03-ib0: timers, args)
elsa-03-ib0: File "/home/soojeong/DeepSpeed/DeepSpeedExamples/Megatron-LM/pretrain_bert.py", line 354, in train
elsa-03-ib0: args, timers)
elsa-03-ib0: File "/home/soojeong/DeepSpeed/DeepSpeedExamples/Megatron-LM/pretrain_bert.py", line 310, in train_step
elsa-03-ib0: nsp_loss, args, timers)
elsa-03-ib0: File "/home/soojeong/DeepSpeed/DeepSpeedExamples/Megatron-LM/pretrain_bert.py", line 255, in backward_step elsa-03-ib0: model.backward(loss)
elsa-03-ib0: File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/deepspeed/pt/deepspeed_light.py", line 665, in backward
elsa-03-ib0: self.optimizer.backward(loss)
elsa-03-ib0: File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/deepspeed/pt/deepspeed_zero_optimizer.py", line 455, in backward
elsa-03-ib0: self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
elsa-03-ib0: File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/deepspeed/pt/loss_scaler.py", line 174, in backward
elsa-03-ib0: scaled_loss.backward(retain_graph=retain_graph)
elsa-03-ib0: File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/torch/tensor.py", line 118, in backward
elsa-03-ib0: torch.autograd.backward(self, gradient, retain_graph, create_graph)
elsa-03-ib0: File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/torch/autograd/init.py", line 93, in backward
elsa-03-ib0: allow_unreachable=True) # allow_unreachable flag
elsa-03-ib0: RuntimeError: leaf variable has been moved into the graph interior
The text was updated successfully, but these errors were encountered:
Following this GPT2 tutorial(https://www.deepspeed.ai/tutorials/megatron/), I modified pretrain_bert to run with deepspeed. However, I got this message.
RuntimeError: leaf variable has been moved into the graph interior
.Do you have any idea that I can fix the error?
Full error messages are in the below.
The text was updated successfully, but these errors were encountered: