You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.
As title, GUI cannot be open after I resume the experiment.
I observe that if the output log does not hang at "Web portal URLs: ...", GUI will be unable to open. However, I can't find the way to keep "nnictl resume ID" command running.
Complete log of command:
[2024-08-26 13:10:28] Creating experiment, Experiment ID: z39sirw8
[2024-08-26 13:10:28] Starting web server...
[2024-08-26 13:10:29] INFO (main) Start NNI manager
[2024-08-26 13:10:29] INFO (RestServer) Starting REST server at port 8080, URL prefix: "/"
[2024-08-26 13:10:29] INFO (RestServer) REST server started.
[2024-08-26 13:10:29] INFO (NNIDataStore) Datastore initialization done
[2024-08-26 13:10:29] Setting up...
[2024-08-26 13:10:30] INFO (NNIManager) Resuming experiment: z39sirw8
[2024-08-26 13:10:30] INFO (NNIManager) Setup training service...
[2024-08-26 13:10:30] INFO (NNIManager) Setup tuner...
[2024-08-26 13:10:31] INFO (NNIManager) Number of current submitted trials: 621, where 0 is resuming.
[2024-08-26 13:10:31] INFO (NNIManager) Change NNIManager status from: INITIALIZED to: RUNNING
[2024-08-26 13:10:31] Web portal URLs: http://127.0.0.1:8080http://172.17.0.2:8080
[2024-08-26 13:10:31] Stopping experiment, please wait...
[2024-08-26 13:10:31] Saving experiment checkpoint...
[2024-08-26 13:10:31] Stopping NNI manager, if any...
[2024-08-26 13:10:31] INFO (ShutdownManager) Initiate shutdown: REST request
[2024-08-26 13:10:31] INFO (RestServer) Stopping REST server.
[2024-08-26 13:10:31] ERROR (ShutdownManager) Error during shutting down NniManager: TypeError: Cannot read properties of undefined (reading 'getBufferedAmount')
at TunerServer.sendCommand (/usr/local/lib/python3.8/dist-packages/nni_node/core/tuner_command_channel.js:60:26)
at NNIManager.stopExperimentTopHalf (/usr/local/lib/python3.8/dist-packages/nni_node/core/nnimanager.js:303:25)
at NNIManager.stopExperiment (/usr/local/lib/python3.8/dist-packages/nni_node/core/nnimanager.js:292:20)
at /usr/local/lib/python3.8/dist-packages/nni_node/common/globals/shutdown.js:49:23
at Array.map ()
at ShutdownManager.shutdown (/usr/local/lib/python3.8/dist-packages/nni_node/common/globals/shutdown.js:47:51)
at ShutdownManager.initiate (/usr/local/lib/python3.8/dist-packages/nni_node/common/globals/shutdown.js:22:18)
at /usr/local/lib/python3.8/dist-packages/nni_node/rest_server/restHandler.js:366:40
at Layer.handle [as handle_request] (/usr/local/lib/python3.8/dist-packages/nni_node/node_modules/express/lib/router/layer.js:95:5)
at next (/usr/local/lib/python3.8/dist-packages/nni_node/node_modules/express/lib/router/route.js:144:13)
[2024-08-26 13:10:31] INFO (NNIManager) Change NNIManager status from: RUNNING to: STOPPING
[2024-08-26 13:10:31] INFO (NNIManager) Stopping experiment, cleaning up ...
[2024-08-26 13:10:31] INFO (ShutdownManager) Shutdown complete.
[2024-08-26 13:10:31] INFO (RestServer) REST server stopped.
[2024-08-26 13:10:31] Experiment stopped.
root@e44bc2dd4409:/workspace/MediaPipePyTorch# Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.8/dist-packages/nni/main.py", line 85, in
main()
File "/usr/local/lib/python3.8/dist-packages/nni/main.py", line 58, in main
dispatcher = MsgDispatcher(url, tuner, assessor)
File "/usr/local/lib/python3.8/dist-packages/nni/runtime/msg_dispatcher.py", line 71, in init
super().init(command_channel_url)
File "/usr/local/lib/python3.8/dist-packages/nni/runtime/msg_dispatcher_base.py", line 47, in init
self._channel.connect()
File "/usr/local/lib/python3.8/dist-packages/nni/runtime/tuner_command_channel/channel.py", line 58, in connect
self._channel.connect()
File "/usr/local/lib/python3.8/dist-packages/nni/runtime/command_channel/websocket/channel.py", line 23, in connect
self._ensure_conn()
File "/usr/local/lib/python3.8/dist-packages/nni/runtime/command_channel/websocket/channel.py", line 75, in _ensure_conn
self._conn.connect()
File "/usr/local/lib/python3.8/dist-packages/nni/runtime/command_channel/websocket/connection.py", line 65, in connect
self._ws = _wait(_connect_async(self._url))
File "/usr/local/lib/python3.8/dist-packages/nni/runtime/command_channel/websocket/connection.py", line 121, in _wait
return future.result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/dist-packages/nni/runtime/command_channel/websocket/connection.py", line 135, in _connect_async
return await websockets.connect(url, max_size=None) # type: ignore
File "/usr/local/lib/python3.8/dist-packages/websockets/legacy/client.py", line 655, in await_impl_timeout
return await self.await_impl()
File "/usr/local/lib/python3.8/dist-packages/websockets/legacy/client.py", line 659, in await_impl
_transport, _protocol = await self._create_connection()
File "/usr/lib/python3.8/asyncio/base_events.py", line 1033, in create_connection
raise OSError('Multiple exceptions: {}'.format(
OSError: Multiple exceptions: [Errno 111] Connect call failed ('127.0.0.1', 8080), [Errno 99] Cannot assign requested address
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
As title, GUI cannot be open after I resume the experiment.
I observe that if the output log does not hang at "Web portal URLs: ...", GUI will be unable to open. However, I can't find the way to keep "nnictl resume ID" command running.
Complete log of command:
[2024-08-26 13:10:28] Creating experiment, Experiment ID: z39sirw8
[2024-08-26 13:10:28] Starting web server...
[2024-08-26 13:10:29] INFO (main) Start NNI manager
[2024-08-26 13:10:29] INFO (RestServer) Starting REST server at port 8080, URL prefix: "/"
[2024-08-26 13:10:29] INFO (RestServer) REST server started.
[2024-08-26 13:10:29] INFO (NNIDataStore) Datastore initialization done
[2024-08-26 13:10:29] Setting up...
[2024-08-26 13:10:30] INFO (NNIManager) Resuming experiment: z39sirw8
[2024-08-26 13:10:30] INFO (NNIManager) Setup training service...
[2024-08-26 13:10:30] INFO (NNIManager) Setup tuner...
[2024-08-26 13:10:31] INFO (NNIManager) Number of current submitted trials: 621, where 0 is resuming.
[2024-08-26 13:10:31] INFO (NNIManager) Change NNIManager status from: INITIALIZED to: RUNNING
[2024-08-26 13:10:31] Web portal URLs: http://127.0.0.1:8080 http://172.17.0.2:8080
[2024-08-26 13:10:31] Stopping experiment, please wait...
[2024-08-26 13:10:31] Saving experiment checkpoint...
[2024-08-26 13:10:31] Stopping NNI manager, if any...
[2024-08-26 13:10:31] INFO (ShutdownManager) Initiate shutdown: REST request
[2024-08-26 13:10:31] INFO (RestServer) Stopping REST server.
[2024-08-26 13:10:31] ERROR (ShutdownManager) Error during shutting down NniManager: TypeError: Cannot read properties of undefined (reading 'getBufferedAmount')
at TunerServer.sendCommand (/usr/local/lib/python3.8/dist-packages/nni_node/core/tuner_command_channel.js:60:26)
at NNIManager.stopExperimentTopHalf (/usr/local/lib/python3.8/dist-packages/nni_node/core/nnimanager.js:303:25)
at NNIManager.stopExperiment (/usr/local/lib/python3.8/dist-packages/nni_node/core/nnimanager.js:292:20)
at /usr/local/lib/python3.8/dist-packages/nni_node/common/globals/shutdown.js:49:23
at Array.map ()
at ShutdownManager.shutdown (/usr/local/lib/python3.8/dist-packages/nni_node/common/globals/shutdown.js:47:51)
at ShutdownManager.initiate (/usr/local/lib/python3.8/dist-packages/nni_node/common/globals/shutdown.js:22:18)
at /usr/local/lib/python3.8/dist-packages/nni_node/rest_server/restHandler.js:366:40
at Layer.handle [as handle_request] (/usr/local/lib/python3.8/dist-packages/nni_node/node_modules/express/lib/router/layer.js:95:5)
at next (/usr/local/lib/python3.8/dist-packages/nni_node/node_modules/express/lib/router/route.js:144:13)
[2024-08-26 13:10:31] INFO (NNIManager) Change NNIManager status from: RUNNING to: STOPPING
[2024-08-26 13:10:31] INFO (NNIManager) Stopping experiment, cleaning up ...
[2024-08-26 13:10:31] INFO (ShutdownManager) Shutdown complete.
[2024-08-26 13:10:31] INFO (RestServer) REST server stopped.
[2024-08-26 13:10:31] Experiment stopped.
root@e44bc2dd4409:/workspace/MediaPipePyTorch# Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.8/dist-packages/nni/main.py", line 85, in
main()
File "/usr/local/lib/python3.8/dist-packages/nni/main.py", line 58, in main
dispatcher = MsgDispatcher(url, tuner, assessor)
File "/usr/local/lib/python3.8/dist-packages/nni/runtime/msg_dispatcher.py", line 71, in init
super().init(command_channel_url)
File "/usr/local/lib/python3.8/dist-packages/nni/runtime/msg_dispatcher_base.py", line 47, in init
self._channel.connect()
File "/usr/local/lib/python3.8/dist-packages/nni/runtime/tuner_command_channel/channel.py", line 58, in connect
self._channel.connect()
File "/usr/local/lib/python3.8/dist-packages/nni/runtime/command_channel/websocket/channel.py", line 23, in connect
self._ensure_conn()
File "/usr/local/lib/python3.8/dist-packages/nni/runtime/command_channel/websocket/channel.py", line 75, in _ensure_conn
self._conn.connect()
File "/usr/local/lib/python3.8/dist-packages/nni/runtime/command_channel/websocket/connection.py", line 65, in connect
self._ws = _wait(_connect_async(self._url))
File "/usr/local/lib/python3.8/dist-packages/nni/runtime/command_channel/websocket/connection.py", line 121, in _wait
return future.result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/dist-packages/nni/runtime/command_channel/websocket/connection.py", line 135, in _connect_async
return await websockets.connect(url, max_size=None) # type: ignore
File "/usr/local/lib/python3.8/dist-packages/websockets/legacy/client.py", line 655, in await_impl_timeout
return await self.await_impl()
File "/usr/local/lib/python3.8/dist-packages/websockets/legacy/client.py", line 659, in await_impl
_transport, _protocol = await self._create_connection()
File "/usr/lib/python3.8/asyncio/base_events.py", line 1033, in create_connection
raise OSError('Multiple exceptions: {}'.format(
OSError: Multiple exceptions: [Errno 111] Connect call failed ('127.0.0.1', 8080), [Errno 99] Cannot assign requested address
The text was updated successfully, but these errors were encountered: