Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.distributed.elastic.multiprocessing.errors.ChildFailedError: #113

Closed
SHITIANYU-hue opened this issue Apr 26, 2023 · 1 comment
Closed

Comments

@SHITIANYU-hue
Copy link

may i know if this issue is my memory is not enough?

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1622990) of binary: /home/changquan/anaconda3/envs/lang/bin/python
Traceback (most recent call last):
File "/home/changquan/anaconda3/envs/lang/bin/torchrun", line 8, in
sys.exit(main())
File "/home/changquan/anaconda3/envs/lang/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs) File "/home/changquan/anaconda3/envs/lang/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/changquan/anaconda3/envs/lang/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/changquan/anaconda3/envs/lang/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/changquan/anaconda3/envs/lang/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

@Facico
Copy link
Owner

Facico commented Apr 26, 2023

This error may pop up whenever the program interrupts abnormally, and there are so many factors that determine this error. You might be able to add a TORCH_DISTRIBUTED_DEBUG=DETAIL in front of the command to show a more detailed error message.

@Facico Facico closed this as completed Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants