Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I have only one GPU , when I running train_refine.py, Show error: RuntimeError: Distributed package doesn't have NCCL built in #211

Open
Zhang278888 opened this issue Nov 12, 2024 · 0 comments

Comments

@Zhang278888
Copy link

D:\software\Anaconda3\envs\bg_matting\python.exe "D:\software\JetBrains\PyCharm 2022.1.3\plugins\python\helpers\pydev\pydevd.py" --multiprocess --qt-support=auto --client 127.0.0.1 --port 12569 --file D:/_zzw_work/gouged/py_project/BackgroundMattingV2/train_refine_ori.py --dataset-name videomatte240k --model-backbone resnet101 --model-name refine_resnet101 --model-last-checkpoint D:_zzw_work\gouged\py_project\BackgroundMattingV2\weights\torchscript_resnet101_fp32.pth --epoch-end 2000
已连接到 pydev 调试器(内部版本号 221.5921.27)Traceback (most recent call last):
File "D:\software\JetBrains\PyCharm 2022.1.3\plugins\python\helpers\pydev\pydevd.py", line 1491, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "D:\software\JetBrains\PyCharm 2022.1.3\plugins\python\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "D:/_zzw_work/gouged/py_project/BackgroundMattingV2/train_refine_ori.py", line 312, in
mp.spawn(train_worker,
File "D:\software\Anaconda3\envs\bg_matting\lib\site-packages\torch\multiprocessing\spawn.py", line 246, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
File "D:\software\Anaconda3\envs\bg_matting\lib\site-packages\torch\multiprocessing\spawn.py", line 202, in start_processes
while not context.join():
File "D:\software\Anaconda3\envs\bg_matting\lib\site-packages\torch\multiprocessing\spawn.py", line 163, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "D:\software\Anaconda3\envs\bg_matting\lib\site-packages\torch\multiprocessing\spawn.py", line 74, in _wrap
fn(i, *args)
File "D:_zzw_work\gouged\py_project\BackgroundMattingV2\train_refine_ori.py", line 82, in train_worker
dist.init_process_group("nccl", rank=rank, world_size=distributed_num_gpus)
File "D:\software\Anaconda3\envs\bg_matting\lib\site-packages\torch\distributed\c10d_logger.py", line 74, in wrapper
func_return = func(*args, **kwargs)
File "D:\software\Anaconda3\envs\bg_matting\lib\site-packages\torch\distributed\distributed_c10d.py", line 1148, in init_process_group
default_pg, _ = _new_process_group_helper(
File "D:\software\Anaconda3\envs\bg_matting\lib\site-packages\torch\distributed\distributed_c10d.py", line 1268, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant