Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error transitioning worker task from waiting to fetch #4757

Open
jrbourbeau opened this issue Apr 27, 2021 · 3 comments
Open

Error transitioning worker task from waiting to fetch #4757

jrbourbeau opened this issue Apr 27, 2021 · 3 comments

Comments

@jrbourbeau
Copy link
Member

Similar to #4721, I had some folks report offline that they were encountering an error due to an unsupported task state transition on their workers (traceback shown below)

Traceback (most recent call last):
  File "/usr/local/python/python-3.7/std/lib64/python3.7/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/usr/local/python/python-3.7/std/lib64/python3.7/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/usr/local/python/python-3.7/std/lib64/python3.7/site-packages/distributed/worker.py", line 2261, in gather_dep
    self.transition(ts, "fetch", worker=worker)
  File "/usr/local/python/python-3.7/std/lib64/python3.7/site-packages/distributed/worker.py", line 1584, in transition
    func = self._transitions[start, finish]
KeyError: ('waiting', 'fetch')

which was observed using dask and distributed 2021.04.0

cc @fjetter @gforsyth

@fjetter
Copy link
Member

fjetter commented Apr 27, 2021

FYI That's a work stealing transition

We could patch this but I'm not very eager of introducing more and more transitions right now without a solid concept to avoid this kinds of problems.

@gforsyth
Copy link
Contributor

I think while patching #4721 was more of a "well, that shouldn't happen but we can work around it for now" this seems like more of an oversight (by me). waiting -> fetch is definitely a valid transition (and we have handling for the inverse)

@jrbourbeau
Copy link
Member Author

Yeah, I with both of you. We should continue to develop higher-level solutions for improving worker state transitions, which I believe Florian is working on now, and also add a waiting -> fetch transition in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants