-
-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InvalidTransition: Impossible transition from memory to missing #6125
Comments
FWIW I've now ween this in the wild with #6110 2022-04-14 21:17:20,415 - distributed.worker - ERROR - Worker stream died during communication: tls://10.6.5.108:37353
Traceback (most recent call last):
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/tornado/iostream.py", line 867, in _read_to_buffer
bytes_read = self.read_from_fd(buf)
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/tornado/iostream.py", line 1592, in read_from_fd
return self.socket.recv_into(buf, len(buf))
File "/opt/conda/envs/coiled/lib/python3.9/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/envs/coiled/lib/python3.9/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3019, in gather_dep
response = await get_data_from_worker(
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 4320, in get_data_from_worker
return await retry_operation(_get_data, operation="get_data_from_worker")
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/utils_comm.py", line 381, in retry_operation
return await retry(
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/utils_comm.py", line 366, in retry
return await coro()
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 4300, in _get_data
response = await send_recv(
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/core.py", line 709, in send_recv
response = await comm.read(deserializers=deserializers)
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/comm/tcp.py", line 242, in read
convert_stream_closed_error(self, e)
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/comm/tcp.py", line 148, in convert_stream_closed_error
raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc
distributed.comm.core.CommClosedError: in <TLS (closed) Ephemeral Worker->Worker for gather local=tls://10.6.6.70:44414 remote=tls://10.6.5.108:37353>: ConnectionResetError: [Errno 104] Connection reset by peer
2022-04-14 21:17:20,442 - distributed.utils - ERROR - Impossible transition from memory to missing for ('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))
Traceback (most recent call last):
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/utils.py", line 693, in log_errors
yield
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3094, in gather_dep
self.transitions(recommendations, stimulus_id=stimulus_id)
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 2607, in transitions
a_recs, a_instructions = self._transition(
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 2543, in _transition
raise InvalidTransition(
distributed.worker_state_machine.InvalidTransition: Impossible transition from memory to missing for ('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))
2022-04-14 21:17:20,646 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7fc44deef3d0>>, <Task finished name='Task-1771' coro=<Worker.gather_dep() done, defined at /opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py:2963> exception=InvalidTransition("Impossible transition from memory to missing for ('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))")>)
Traceback (most recent call last):
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/tornado/ioloop.py", line 741, in _run_callback
ret = callback()
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
future.result()
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3094, in gather_dep
self.transitions(recommendations, stimulus_id=stimulus_id)
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 2607, in transitions
a_recs, a_instructions = self._transition(
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 2543, in _transition
raise InvalidTransition(
distributed.worker_state_machine.InvalidTransition: Impossible transition from memory to missing for ('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))
2022-04-14 21:17:39,338 - distributed.worker - ERROR - Exception during execution of task ('shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 123).
Traceback (most recent call last):
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3693, in _prepare_args_for_execution
data[k] = self.data[k]
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/zict/buffer.py", line 87, in __getitem__
raise KeyError(key)
KeyError: "('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3497, in execute
args2, kwargs2 = self._prepare_args_for_execution(ts, args, kwargs)
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3697, in _prepare_args_for_execution
data[k] = Actor(type(self.actors[k]), self.address, k, self)
KeyError: "('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))" |
I'm reproducing this locally in
#6123
…On Fri, Apr 15, 2022 at 12:58 AM Gabe Joseph ***@***.***> wrote:
FWIW I've now ween this in the wild with #6110
<#6110>
2022-04-14 21:17:20,415 - distributed.worker - ERROR - Worker stream died during communication: tls://10.6.5.108:37353
Traceback (most recent call last):
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/tornado/iostream.py", line 867, in _read_to_buffer
bytes_read = self.read_from_fd(buf)
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/tornado/iostream.py", line 1592, in read_from_fd
return self.socket.recv_into(buf, len(buf))
File "/opt/conda/envs/coiled/lib/python3.9/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/envs/coiled/lib/python3.9/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)ConnectionResetError: [Errno 104] Connection reset by peer
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3019, in gather_dep
response = await get_data_from_worker(
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 4320, in get_data_from_worker
return await retry_operation(_get_data, operation="get_data_from_worker")
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/utils_comm.py", line 381, in retry_operation
return await retry(
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/utils_comm.py", line 366, in retry
return await coro()
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 4300, in _get_data
response = await send_recv(
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/core.py", line 709, in send_recv
response = await comm.read(deserializers=deserializers)
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/comm/tcp.py", line 242, in read
convert_stream_closed_error(self, e)
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/comm/tcp.py", line 148, in convert_stream_closed_error
raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from excdistributed.comm.core.CommClosedError: in <TLS (closed) Ephemeral Worker->Worker for gather local=tls://10.6.6.70:44414 remote=tls://10.6.5.108:37353>: ConnectionResetError: [Errno 104] Connection reset by peer
2022-04-14 21:17:20,442 - distributed.utils - ERROR - Impossible transition from memory to missing for ('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))
Traceback (most recent call last):
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/utils.py", line 693, in log_errors
yield
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3094, in gather_dep
self.transitions(recommendations, stimulus_id=stimulus_id)
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 2607, in transitions
a_recs, a_instructions = self._transition(
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 2543, in _transition
raise InvalidTransition(distributed.worker_state_machine.InvalidTransition: Impossible transition from memory to missing for ('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))
2022-04-14 21:17:20,646 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7fc44deef3d0>>, <Task finished name='Task-1771' coro=<Worker.gather_dep() done, defined at /opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py:2963> exception=InvalidTransition("Impossible transition from memory to missing for ('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))")>)
Traceback (most recent call last):
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/tornado/ioloop.py", line 741, in _run_callback
ret = callback()
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
future.result()
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3094, in gather_dep
self.transitions(recommendations, stimulus_id=stimulus_id)
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 2607, in transitions
a_recs, a_instructions = self._transition(
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 2543, in _transition
raise InvalidTransition(distributed.worker_state_machine.InvalidTransition: Impossible transition from memory to missing for ('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))
2022-04-14 21:17:39,338 - distributed.worker - ERROR - Exception during execution of task ('shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 123).
Traceback (most recent call last):
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3693, in _prepare_args_for_execution
data[k] = self.data[k]
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/zict/buffer.py", line 87, in __getitem__
raise KeyError(key)KeyError: "('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3497, in execute
args2, kwargs2 = self._prepare_args_for_execution(ts, args, kwargs)
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3697, in _prepare_args_for_execution
data[k] = Actor(type(self.actors[k]), self.address, k, self)KeyError: "('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))"
—
Reply to this email directly, view it on GitHub
<#6125 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTFXMVTGLCDS4UYIJEDVFEAR5ANCNFSM5TMK73SQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Yeah, resolved there I think
…On Fri, Apr 15, 2022 at 5:53 AM Matthew Rocklin ***@***.***> wrote:
I'm reproducing this locally in
#6123
On Fri, Apr 15, 2022 at 12:58 AM Gabe Joseph ***@***.***>
wrote:
> FWIW I've now ween this in the wild with #6110
> <#6110>
>
> 2022-04-14 21:17:20,415 - distributed.worker - ERROR - Worker stream died during communication: tls://10.6.5.108:37353
> Traceback (most recent call last):
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/tornado/iostream.py", line 867, in _read_to_buffer
> bytes_read = self.read_from_fd(buf)
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/tornado/iostream.py", line 1592, in read_from_fd
> return self.socket.recv_into(buf, len(buf))
> File "/opt/conda/envs/coiled/lib/python3.9/ssl.py", line 1241, in recv_into
> return self.read(nbytes, buffer)
> File "/opt/conda/envs/coiled/lib/python3.9/ssl.py", line 1099, in read
> return self._sslobj.read(len, buffer)ConnectionResetError: [Errno 104] Connection reset by peer
> The above exception was the direct cause of the following exception:
> Traceback (most recent call last):
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3019, in gather_dep
> response = await get_data_from_worker(
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 4320, in get_data_from_worker
> return await retry_operation(_get_data, operation="get_data_from_worker")
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/utils_comm.py", line 381, in retry_operation
> return await retry(
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/utils_comm.py", line 366, in retry
> return await coro()
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 4300, in _get_data
> response = await send_recv(
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/core.py", line 709, in send_recv
> response = await comm.read(deserializers=deserializers)
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/comm/tcp.py", line 242, in read
> convert_stream_closed_error(self, e)
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/comm/tcp.py", line 148, in convert_stream_closed_error
> raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from excdistributed.comm.core.CommClosedError: in <TLS (closed) Ephemeral Worker->Worker for gather local=tls://10.6.6.70:44414 remote=tls://10.6.5.108:37353>: ConnectionResetError: [Errno 104] Connection reset by peer
> 2022-04-14 21:17:20,442 - distributed.utils - ERROR - Impossible transition from memory to missing for ('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))
> Traceback (most recent call last):
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/utils.py", line 693, in log_errors
> yield
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3094, in gather_dep
> self.transitions(recommendations, stimulus_id=stimulus_id)
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 2607, in transitions
> a_recs, a_instructions = self._transition(
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 2543, in _transition
> raise InvalidTransition(distributed.worker_state_machine.InvalidTransition: Impossible transition from memory to missing for ('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))
> 2022-04-14 21:17:20,646 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7fc44deef3d0>>, <Task finished name='Task-1771' coro=<Worker.gather_dep() done, defined at /opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py:2963> exception=InvalidTransition("Impossible transition from memory to missing for ('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))")>)
> Traceback (most recent call last):
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/tornado/ioloop.py", line 741, in _run_callback
> ret = callback()
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
> future.result()
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3094, in gather_dep
> self.transitions(recommendations, stimulus_id=stimulus_id)
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 2607, in transitions
> a_recs, a_instructions = self._transition(
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 2543, in _transition
> raise InvalidTransition(distributed.worker_state_machine.InvalidTransition: Impossible transition from memory to missing for ('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))
> 2022-04-14 21:17:39,338 - distributed.worker - ERROR - Exception during execution of task ('shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 123).
> Traceback (most recent call last):
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3693, in _prepare_args_for_execution
> data[k] = self.data[k]
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/zict/buffer.py", line 87, in __getitem__
> raise KeyError(key)KeyError: "('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))"
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3497, in execute
> args2, kwargs2 = self._prepare_args_for_execution(ts, args, kwargs)
> File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3697, in _prepare_args_for_execution
> data[k] = Actor(type(self.actors[k]), self.address, k, self)KeyError: "('split-shuffle-1-b4961b03aa9e8bec7c581d2dc337f717', 10, (3, 9))"
>
> —
> Reply to this email directly, view it on GitHub
> <#6125 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AACKZTFXMVTGLCDS4UYIJEDVFEAR5ANCNFSM5TMK73SQ>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
I believe that this is resolved by #6123 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
cc @fjetter @gjoseph92
The text was updated successfully, but these errors were encountered: