You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
may i know if this issue is my memory is not enough?
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1622990) of binary: /home/changquan/anaconda3/envs/lang/bin/python
Traceback (most recent call last):
File "/home/changquan/anaconda3/envs/lang/bin/torchrun", line 8, in
sys.exit(main())
File "/home/changquan/anaconda3/envs/lang/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs) File "/home/changquan/anaconda3/envs/lang/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/changquan/anaconda3/envs/lang/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/changquan/anaconda3/envs/lang/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/changquan/anaconda3/envs/lang/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
The text was updated successfully, but these errors were encountered:
This error may pop up whenever the program interrupts abnormally, and there are so many factors that determine this error. You might be able to add a TORCH_DISTRIBUTED_DEBUG=DETAIL in front of the command to show a more detailed error message.
may i know if this issue is my memory is not enough?
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1622990) of binary: /home/changquan/anaconda3/envs/lang/bin/python
Traceback (most recent call last):
File "/home/changquan/anaconda3/envs/lang/bin/torchrun", line 8, in
sys.exit(main())
File "/home/changquan/anaconda3/envs/lang/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs) File "/home/changquan/anaconda3/envs/lang/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/changquan/anaconda3/envs/lang/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/changquan/anaconda3/envs/lang/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/changquan/anaconda3/envs/lang/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
The text was updated successfully, but these errors were encountered: