Replies: 1 comment
-
切换torch 2.1.x |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
[2024-02-18 17:45:10,448] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[W socket.cpp:697] [c10d] The client socket has failed to connect to [LAPTOP-N2VGJ0V0]:29500 (system error: 10049 - 在其上下文中,该请求的地址无效。).
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\Users\ASUS\anaconda3\Scripts\torchrun.exe_main.py", line 7, in
File "C:\Users\ASUS\anaconda3\Lib\site-packages\torch\distributed\elastic\multiprocessing\errors_init.py", line 347, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "C:\Users\ASUS\anaconda3\Lib\site-packages\torch\distributed\run.py", line 812, in main
run(args)
File "C:\Users\ASUS\anaconda3\Lib\site-packages\torch\distributed\run.py", line 803, in run
elastic_launch(
File "C:\Users\ASUS\anaconda3\Lib\site-packages\torch\distributed\launcher\api.py", line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ASUS\anaconda3\Lib\site-packages\torch\distributed\launcher\api.py", line 259, in launch_agent
result = agent.run()
^^^^^^^^^^^
File "C:\Users\ASUS\anaconda3\Lib\site-packages\torch\distributed\elastic\metrics\api.py", line 123, in wrapper
result = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "C:\Users\ASUS\anaconda3\Lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 727, in run
result = self._invoke_run(role)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ASUS\anaconda3\Lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 862, in _invoke_run
self._initialize_workers(self._worker_group)
File "C:\Users\ASUS\anaconda3\Lib\site-packages\torch\distributed\elastic\metrics\api.py", line 123, in wrapper
result = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "C:\Users\ASUS\anaconda3\Lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 699, in _initialize_workers
self._rendezvous(worker_group)
File "C:\Users\ASUS\anaconda3\Lib\site-packages\torch\distributed\elastic\metrics\api.py", line 123, in wrapper
result = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "C:\Users\ASUS\anaconda3\Lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 542, in _rendezvous
store, group_rank, group_world_size = spec.rdzv_handler.next_rendezvous()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ASUS\anaconda3\Lib\site-packages\torch\distributed\elastic\rendezvous\static_tcp_rendezvous.py", line 55, in next_rendezvous
self._store = TCPStore( # type: ignore[call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.distributed.DistNetworkError: Unknown error
Beta Was this translation helpful? Give feedback.
All reactions