-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a :dynamic
scheduling option for Threads.@threads
#43919
Conversation
03255f8
to
9619413
Compare
base/threadingconstructs.jl
Outdated
- `:dynamic` is like `:static` except the tasks are assigned to threads dynamically, | ||
allowing more flexible scheduling if other tasks are active on other threads. | ||
Specifying `:dynamic` is allowed from inside another `@threads` loop and from | ||
threads other than 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest avoiding specifying the behavior as much as possible, even though it may be a losing battle against Hyrum's law.
Before mentioning schedules, I think it'd be a good idea to explicitly say what is OK and what is not. Maybe something like:
Except
:static
scheduling, how the iterations are assigned to tasks, and how the tasks are assigned to the worker threads are undefined. The exact assignments can be different for each execution. The scheduling option should be considered as a hint. The loop body code (including any code transitively called from it) must not assume the task and worker thread in which they are executed. The loop body code for each iteration must be able to make forward progress and be free from data races independent of the state of other iterations. As such, synchronizations across iterations may invoke deadlock while unsynchronized access to overlapping locations can invoke data races and hence undefined behaviors.For example, the above conditions imply that:
- The lock taken in an iteration must be released within the same iteration.
- Avoid communication between iterations using, e.g.,
Channel
s.- Write only to locations not shared across iterations (unless lock or atomic operation is used).
Furthermore, even though lock and atomic can be useful sometimes, it is often better to avoid them for performance.
(We can maybe borrow some ideas from the C++ executor definition to make the wording more precise.)
After this, I think we can simply mention
- `:dynamic` is like `:static` except the tasks are assigned to threads dynamically, | |
allowing more flexible scheduling if other tasks are active on other threads. | |
Specifying `:dynamic` is allowed from inside another `@threads` loop and from | |
threads other than 1. | |
- `:dynamic` tries to schedule iterations dynamically to available worker threads, | |
assuming that the workload for each iteration is reasonably uniform. |
...which makes me wonder if :uniform
is a better wording? Users may expect some kind of load-balancing for :dynamic
scheduling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I've pushed your suggestions.
I think :dynamic
makes a bit more sense as it's a description of the scheduling policy, not the underlying workload, which :uniform
refers to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was more like, since possible additional scheduling will likely be all "dynamic" in some sense, it'd be better to emphasize other aspects. So yeah, it does focus on the underlying workload.
For example, it's reasonable to add :loadbalancing
that splits the iteration space to nthreads() * 8
chunks (say). But both :dynamic
and :loadbalancing
are "dynamic."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense I think. If :loadbalaning
was added the schedules could be summarized in the docstring as
- static
Dynamic options:
- uniform
- loadbalancing
Though, I think an issue is that in code @threads :uniform for
doesn't alone speak to dynamic behavior, but perhaps that's ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this option becomes the "go to option" for people who want to do threading (which it should probably be if I understand correctly?), calling it uniform is a bit confusing (unless it's the default and can be omitted)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO "go to option" is FLoops.jl but it's really hard to tell people to not think "it's in Base
so it must be good" (but yeah, of course, I would say this 😆). But FLoops.jl is not ready for Base
and it's kind of stupid to not harvest really low-hanging fruit like this. That's why I thought that a patch like this is good.
Anyway, yes, I agree :uniform
doesn't shout "USE ME!". Also, it may be reasonable to argue that even :uniform
is specifying too much. Maybe we should just have :dynamic
. It let us, in the future, internally use something similar to :loadbalancing
if the compiler sees some branches and something similar to :uniform
otherwise. Or maybe we can do some clever run-time trick.
Then, maybe we should simply write:
New code using
@threads
, especially those that can be called from library API, should try to use:dynamic
. It schedules iterations dynamically to available worker threads while minimizing overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale for not making it the default? Is it safety considerations? If yes, might be good to make it explicit, eg "The default is :static, to ensure safety, but code with independent iterations should use :dynamic which is usually more performant."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:static
is not the default, both before and after this PR. It also does not try to ensure "safety" for arbitrary code. The purpose of :static
is to keep the pre-1.5 behavior in order to help the old programs that assumed the way @threads
scheduled iterations in the previous versions.
Co-Authored-By: Takafumi Arakaki <[email protected]>
Thanks, it looks like this is a great change for users confused by threads like me. I think the docs would benefit from being expanded a bit. For instance, is the dynamic schedule equivalent to a for loop and a spawn inside each iteration? I also think the example would be clearer with an iteration dependent wait time instead of an additional spawn (the link between thrreads and spawn is not clear to me, nor I would imagine to many people). Also the reference to workload being uniform is confusing to me : in the limit where threading overhead is negligible compared to per iteration cost, this strategy will be optimal irrespective of uniformity, right? |
This is not the case. This is also exactly what I wanted to clarify in #43919 (comment) (now already in the docstring). If you spawn a task in each iteration, caveats like "Avoid communication between iterations using, e.g., ch = Channel()
@threads for i in 1:2
if i == 1
take!(ch)
else
put!(ch, i)
end
end If each iteration is a task on its own, there's no deadlock. However, if
The distribution of the workload does play a crucial role. As a rather extreme case, consider |
I think I'm even more confused than before now, and have no clear idea of how all the different schedules work, sorry if I'm dense. I understand the reluctance to document them in detail because they are implementation details that might change in the future, but since multithreading is, after all, an optimization, it feels important for users to have some idea of what's going on behind the scenes (and tell them it might change in the future if necessary). If I understand correctly, static splits the iteration range in nthreads chunks, then each thread treats these chunks. |
I agree that it is reasonable to explain the performance characteristics if we have multiple options. But we only have one recommended option aka But, if you think #43919 (comment) is not useful for writing correct parallel loop, please do bring up the unclear points. As for why something like Note: I just remember that there were similar PRs like #35003 and #32477. I also suggested it in #35646 (comment) but it was not reflected in #35646. |
Thanks for clarifying, I was definitely confused about this. Currently, the docs doesn't explicitly say that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together, LGTM!
Co-authored-by: Julian Samaroo <[email protected]>
Co-Authored-By: Takafumi Arakaki <[email protected]>
61bb51a
to
5202378
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Just review requests for final approval. (also just added to NEWS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added some more nitpicks. I think it's good to go other than these points.
Co-authored-by: Takafumi Arakaki <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming that the workload for each iteration is uniform
This is not really what I would have assumed "dynamic" means. I would have guessed this schedule was more closely syntactic sugar for Threads.foreach
, which currently has generally had the more optimal schedules than @threads
or this PR.
Yeah, that's why I initially wanted to call it
Since we established that synchronization across iterations is not a valid usage, I think there are more clever ways to lower |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
The new documentation of the scheduling options confuses me a bit. It starts with
but it's not clear (at least not to me) whether everything that follows (the big paragraph and the examples) doesn't apply to For example, the rather general statement
seems to suggest that even |
I think the key merge blocker now is whether this is called Is there a multithreading call before 1.8 feature freeze? If there is, perhaps it could be decided there? Perhaps once that's decided this can be merged, and docs be improved after |
Just FYI I support |
Great. So it seems we stick with Please feel free to merge if that sounds good. |
Sounds good to me but I think @vtjnash is not very happy with the name |
IMO there could be further optimizations of the If that plan doesn't sound good, it would be good to elaborate on the blocking issues given the 1.8 feature freeze is looming. |
Yeah, I think we spent enough time on tweaking spec/implementation and gathering feedback. I think we peppered enough caution (hopefully) to make future improvements possible. I'm merging this now. (tester_win32 failure seems irrelevant) |
…43919) Co-authored-by: Julian Samaroo <[email protected]> Co-authored-by: Takafumi Arakaki <[email protected]> Co-authored-by: Valentin Churavy <[email protected]>
…43919) Co-authored-by: Julian Samaroo <[email protected]> Co-authored-by: Takafumi Arakaki <[email protected]> Co-authored-by: Valentin Churavy <[email protected]>
…43919) Co-authored-by: Julian Samaroo <[email protected]> Co-authored-by: Takafumi Arakaki <[email protected]> Co-authored-by: Valentin Churavy <[email protected]>
Ref #35646 - the idea here is largely proposed there. This is perhaps mainly a question of whether now is the right time to do this?
Enables a more composable variant of
Threads.@threads
via a new:dynamic
schedule strategy.An example where
busywait
is a non-yielding timed loop that waits for a number of seconds.Note that
:static
is the current default where no schedule is specified:Unlike
:static
the:dynamic
strategy can be nested.@vtjnash: A thought was that the threaded region didn't need to be marked with this strategy, is that correct?
Co-authors: @tkf @jpsamaroo