-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix world age issue with custom streams for Distributed workers #42481
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems odd to be doing this without holding a relevant lock, but lgtm
A lock around the global worker state in Distributed? Definitely a good idea, though orthogonal to the problem fixed here. |
Would be good to have a test for custom message streams. |
yeah, many locks are likely missing, since we next try to iterate the shared |
Yeah, testing this reliably seemed like a lot more work than just fixing it 😬 In practice we found this problem to be quite intermittent in our use case where we have a custom |
Not necessaily? After #41449 tasks have a predictable world. So you just need to load and define some new code, and you should hit the worldage issue. |
Good point! We hit this in 1.6 but I forgot that #41449 is already merged in dev. |
@c42f Can you rebase on the latest master? This should fix the |
If connect(::CustomClusterManager, ...) returns a custom transport stream, use of that stream by the task in start_gc_msgs_task() may fail due to the task executing in an old world age. Add an invokelatest() to prevent this problem.
2ff7930
to
2e44818
Compare
Rebased. Truth be told I still don't know how to write a good test for this without rewriting some things to have more control over how the |
CI is all green, but it looks like the PR still needs a test, so I'll remove the |
This seems strictly better than the existing code and is likely to fix a real problem. The change is also such that it is extremely unlikely to cause any regressions. So merge and open an issue about adding a test? |
That seems fine to me! |
Seems good to me. I've merged this and opened #43109. |
If connect(::CustomClusterManager, ...) returns a custom transport stream, use of that stream by the task in start_gc_msgs_task() may fail due to the task executing in an old world age. Add an invokelatest() to prevent this problem. (cherry picked from commit a05bcb2)
If connect(::CustomClusterManager, ...) returns a custom transport stream, use of that stream by the task in start_gc_msgs_task() may fail due to the task executing in an old world age. Add an invokelatest() to prevent this problem. (cherry picked from commit a05bcb2)
If connect(::CustomClusterManager, ...) returns a custom transport stream, use of that stream by the task in start_gc_msgs_task() may fail due to the task executing in an old world age. Add an invokelatest() to prevent this problem. (cherry picked from commit a05bcb2)
…aLang#42481) If connect(::CustomClusterManager, ...) returns a custom transport stream, use of that stream by the task in start_gc_msgs_task() may fail due to the task executing in an old world age. Add an invokelatest() to prevent this problem.
…aLang#42481) If connect(::CustomClusterManager, ...) returns a custom transport stream, use of that stream by the task in start_gc_msgs_task() may fail due to the task executing in an old world age. Add an invokelatest() to prevent this problem.
If connect(::CustomClusterManager, ...) returns a custom transport stream, use of that stream by the task in start_gc_msgs_task() may fail due to the task executing in an old world age. Add an invokelatest() to prevent this problem. (cherry picked from commit a05bcb2)
…aLang/julia#42481) If connect(::CustomClusterManager, ...) returns a custom transport stream, use of that stream by the task in start_gc_msgs_task() may fail due to the task executing in an old world age. Add an invokelatest() to prevent this problem. (cherry picked from commit 24afe90)
If connect(::CustomClusterManager, ...) returns a custom transport stream, use of that stream by the task in start_gc_msgs_task() may fail due to the task executing in an old world age. Add an invokelatest() to prevent this problem.
If
connect(::CustomClusterManager, ...)
returns a custom transportstream, use of that stream by the task in
start_gc_msgs_task()
may faildue to the task executing in an old world age. Add an
invokelatest()
toprevent this problem.
CC @tanmaykm