Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Megatron-LM BERT #18

Open
sj6077 opened this issue Apr 22, 2020 · 0 comments
Open

Megatron-LM BERT #18

sj6077 opened this issue Apr 22, 2020 · 0 comments

Comments

@sj6077
Copy link

sj6077 commented Apr 22, 2020

Following this GPT2 tutorial(https://www.deepspeed.ai/tutorials/megatron/), I modified pretrain_bert to run with deepspeed. However, I got this message. RuntimeError: leaf variable has been moved into the graph interior.
Do you have any idea that I can fix the error?

Full error messages are in the below.

elsa-03-ib0: Traceback (most recent call last):
elsa-03-ib0: File "/home/soojeong/DeepSpeed/DeepSpeedExamples/Megatron-LM/pretrain_bert.py", line 617, in
elsa-03-ib0: main()
elsa-03-ib0: File "/home/soojeong/DeepSpeed/DeepSpeedExamples/Megatron-LM/pretrain_bert.py", line 595, in main
elsa-03-ib0: timers, args)
elsa-03-ib0: File "/home/soojeong/DeepSpeed/DeepSpeedExamples/Megatron-LM/pretrain_bert.py", line 354, in train
elsa-03-ib0: args, timers)
elsa-03-ib0: File "/home/soojeong/DeepSpeed/DeepSpeedExamples/Megatron-LM/pretrain_bert.py", line 310, in train_step
elsa-03-ib0: nsp_loss, args, timers)
elsa-03-ib0: File "/home/soojeong/DeepSpeed/DeepSpeedExamples/Megatron-LM/pretrain_bert.py", line 255, in backward_step
elsa-03-ib0: model.backward(loss)
elsa-03-ib0: File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/deepspeed/pt/deepspeed_light.py", line 665, in backward
elsa-03-ib0: self.optimizer.backward(loss)
elsa-03-ib0: File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/deepspeed/pt/deepspeed_zero_optimizer.py", line 455, in backward
elsa-03-ib0: self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
elsa-03-ib0: File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/deepspeed/pt/loss_scaler.py", line 174, in backward
elsa-03-ib0: scaled_loss.backward(retain_graph=retain_graph)
elsa-03-ib0: File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/torch/tensor.py", line 118, in backward
elsa-03-ib0: torch.autograd.backward(self, gradient, retain_graph, create_graph)
elsa-03-ib0: File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/torch/autograd/init.py", line 93, in backward
elsa-03-ib0: allow_unreachable=True) # allow_unreachable flag
elsa-03-ib0: RuntimeError: leaf variable has been moved into the graph interior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant