Add a `:dynamic` scheduling option for `Threads.@threads` #43919

IanButterworth · 2022-01-25T03:09:29Z

Ref #35646 - the idea here is largely proposed there. This is perhaps mainly a question of whether now is the right time to do this?

Enables a more composable variant of Threads.@threads via a new :dynamic schedule strategy.

An example where busywait is a non-yielding timed loop that waits for a number of seconds.
Note that :static is the current default where no schedule is specified:

julia> @time begin
            Threads.@spawn busywait(5)
            Threads.@threads :static for i in 1:Threads.nthreads()
                busywait(1)
            end
        end
6.003001 seconds (16.33 k allocations: 899.255 KiB, 0.25% compilation time)

julia> @time begin
            Threads.@spawn busywait(5)
            Threads.@threads :dynamic for i in 1:Threads.nthreads()
                busywait(1)
            end
        end
2.012056 seconds (16.05 k allocations: 883.919 KiB, 0.66% compilation time)

Unlike :static the :dynamic strategy can be nested.

julia> @time begin
        Threads.@spawn busywait(5)
        Threads.@threads :dynamic for i in 1:Threads.nthreads()
            Threads.@threads :dynamic for j in 1:Threads.nthreads()
                busywait(1)
            end
        end
    end
 17.023767 seconds (61.46 k allocations: 3.402 MiB, 0.14% compilation time)

julia> Threads.nthreads()
16

@vtjnash: A thought was that the threaded region didn't need to be marked with this strategy, is that correct?

Co-authors: @tkf @jpsamaroo

base/threadingconstructs.jl

tkf · 2022-01-25T04:05:46Z

base/threadingconstructs.jl

+- `:dynamic` is like `:static` except the tasks are assigned to threads dynamically,
+            allowing more flexible scheduling if other tasks are active on other threads.
+            Specifying `:dynamic` is allowed from inside another `@threads` loop and from
+            threads other than 1.


I suggest avoiding specifying the behavior as much as possible, even though it may be a losing battle against Hyrum's law.

Before mentioning schedules, I think it'd be a good idea to explicitly say what is OK and what is not. Maybe something like:

Except :static scheduling, how the iterations are assigned to tasks, and how the tasks are assigned to the worker threads are undefined. The exact assignments can be different for each execution. The scheduling option should be considered as a hint. The loop body code (including any code transitively called from it) must not assume the task and worker thread in which they are executed. The loop body code for each iteration must be able to make forward progress and be free from data races independent of the state of other iterations. As such, synchronizations across iterations may invoke deadlock while unsynchronized access to overlapping locations can invoke data races and hence undefined behaviors.

For example, the above conditions imply that:

The lock taken in an iteration must be released within the same iteration.

Avoid communication between iterations using, e.g., Channels.

Write only to locations not shared across iterations (unless lock or atomic operation is used).

Furthermore, even though lock and atomic can be useful sometimes, it is often better to avoid them for performance.

(We can maybe borrow some ideas from the C++ executor definition to make the wording more precise.)

After this, I think we can simply mention

Suggested change

- `:dynamic` is like `:static` except the tasks are assigned to threads dynamically,

allowing more flexible scheduling if other tasks are active on other threads.

Specifying `:dynamic` is allowed from inside another `@threads` loop and from

threads other than 1.

- `:dynamic` tries to schedule iterations dynamically to available worker threads,

assuming that the workload for each iteration is reasonably uniform.

...which makes me wonder if :uniform is a better wording? Users may expect some kind of load-balancing for :dynamic scheduling.

Makes sense. I've pushed your suggestions.

I think :dynamic makes a bit more sense as it's a description of the scheduling policy, not the underlying workload, which :uniform refers to?

My thought was more like, since possible additional scheduling will likely be all "dynamic" in some sense, it'd be better to emphasize other aspects. So yeah, it does focus on the underlying workload.

For example, it's reasonable to add :loadbalancing that splits the iteration space to nthreads() * 8 chunks (say). But both :dynamic and :loadbalancing are "dynamic."

That makes sense I think. If :loadbalaning was added the schedules could be summarized in the docstring as

- static Dynamic options: - uniform - loadbalancing

Though, I think an issue is that in code @threads :uniform for doesn't alone speak to dynamic behavior, but perhaps that's ok?

If this option becomes the "go to option" for people who want to do threading (which it should probably be if I understand correctly?), calling it uniform is a bit confusing (unless it's the default and can be omitted)

IMHO "go to option" is FLoops.jl but it's really hard to tell people to not think "it's in Base so it must be good" (but yeah, of course, I would say this 😆). But FLoops.jl is not ready for Base and it's kind of stupid to not harvest really low-hanging fruit like this. That's why I thought that a patch like this is good.

Anyway, yes, I agree :uniform doesn't shout "USE ME!". Also, it may be reasonable to argue that even :uniform is specifying too much. Maybe we should just have :dynamic. It let us, in the future, internally use something similar to :loadbalancing if the compiler sees some branches and something similar to :uniform otherwise. Or maybe we can do some clever run-time trick.

Then, maybe we should simply write:

New code using @threads, especially those that can be called from library API, should try to use :dynamic. It schedules iterations dynamically to available worker threads while minimizing overhead.

What's the rationale for not making it the default? Is it safety considerations? If yes, might be good to make it explicit, eg "The default is :static, to ensure safety, but code with independent iterations should use :dynamic which is usually more performant."

:static is not the default, both before and after this PR. It also does not try to ensure "safety" for arbitrary code. The purpose of :static is to keep the pre-1.5 behavior in order to help the old programs that assumed the way @threads scheduled iterations in the previous versions.

Co-Authored-By: Takafumi Arakaki <[email protected]>

antoine-levitt · 2022-01-25T06:40:39Z

Thanks, it looks like this is a great change for users confused by threads like me. I think the docs would benefit from being expanded a bit. For instance, is the dynamic schedule equivalent to a for loop and a spawn inside each iteration? I also think the example would be clearer with an iteration dependent wait time instead of an additional spawn (the link between thrreads and spawn is not clear to me, nor I would imagine to many people). Also the reference to workload being uniform is confusing to me : in the limit where threading overhead is negligible compared to per iteration cost, this strategy will be optimal irrespective of uniformity, right?

tkf · 2022-01-25T08:02:06Z

For instance, is the dynamic schedule equivalent to a for loop and a spawn inside each iteration?

This is not the case. This is also exactly what I wanted to clarify in #43919 (comment) (now already in the docstring). If you spawn a task in each iteration, caveats like "Avoid communication between iterations using, e.g., Channels." don't apply. To make it more concrete, consider

ch = Channel()
@threads for i in 1:2
    if i == 1
        take!(ch)
    else
        put!(ch, i)
    end
end

If each iteration is a task on its own, there's no deadlock. However, if @threads does some clever thing and decide to use just one task, it's equivalent to the serial loop. It's easy to see that the above code deadlocks.

Also the reference to workload being uniform is confusing to me : in the limit where threading overhead is negligible compared to per iteration cost, this strategy will be optimal irrespective of uniformity, right?

The distribution of the workload does play a crucial role. As a rather extreme case, consider for i in 1:10; sleep(exp(i)); end and assume that sleep(t) really is a compute-intensive function that takes about t seconds to complete. Also, assume that I don't have more than 10 CPUs and so I set nthreads < 10. Every iteration takes more than e seconds. Compared to this, task spawn+sync overhead is nothing. But I do want to spawn more than nthreads tasks in this case because the last few tasks dominate the whole run-time.

antoine-levitt · 2022-01-25T09:52:30Z

I think I'm even more confused than before now, and have no clear idea of how all the different schedules work, sorry if I'm dense. I understand the reluctance to document them in detail because they are implementation details that might change in the future, but since multithreading is, after all, an optimization, it feels important for users to have some idea of what's going on behind the scenes (and tell them it might change in the future if necessary). If I understand correctly, static splits the iteration range in nthreads chunks, then each thread treats these chunks. for i =1:N @spawn begin ... end end creates N tasks which form a pool, and the nthreads take work from there. So what does dynamic do if not these two?

tkf · 2022-01-25T11:22:11Z

I agree that it is reasonable to explain the performance characteristics if we have multiple options. But we only have one recommended option aka :dynamic. So, I don't think there is a need for explaining the difference, at least now.

But, if you think #43919 (comment) is not useful for writing correct parallel loop, please do bring up the unclear points.

As for why something like :dynamic is not the default, it is presumably because triage wanted to keep the behavior of @threads in #35646 similar to the one prior to the PR. @JeffBezanson said "in a later version we can change the default schedule" (which was pre-1.5). Is 1.8 sufficiently later?

Note: I just remember that there were similar PRs like #35003 and #32477. I also suggested it in #35646 (comment) but it was not reflected in #35646.

antoine-levitt · 2022-01-25T11:45:21Z

Thanks for clarifying, I was definitely confused about this. Currently, the docs doesn't explicitly say that Threads.@threads is the same as Threads.@threads :static but since there's only one schedule I feel that it heavily implies it. In that case, can the old behavior be made to be a schedule? That would be :static, :old (hopefully with a better name), and :dynamic.

jpsamaroo

Thanks for putting this together, LGTM!

base/threadingconstructs.jl

Co-authored-by: Julian Samaroo <[email protected]>

base/threadingconstructs.jl

Co-Authored-By: Takafumi Arakaki <[email protected]>

base/threadingconstructs.jl

IanButterworth · 2022-02-04T23:09:03Z

Just review requests for final approval. (also just added to NEWS)

tkf

I just added some more nitpicks. I think it's good to go other than these points.

NEWS.md

base/threadingconstructs.jl

test/threads_exec.jl

Co-authored-by: Takafumi Arakaki <[email protected]>

vtjnash

assuming that the workload for each iteration is uniform

This is not really what I would have assumed "dynamic" means. I would have guessed this schedule was more closely syntactic sugar for Threads.foreach, which currently has generally had the more optimal schedules than @threads or this PR.

tkf · 2022-02-05T22:03:52Z

Yeah, that's why I initially wanted to call it :uniform.

:dynamic will schedule iterations dynamically to available worker threads. Currently, it assumes that the workload for each iteration is uniform.

Since we established that synchronization across iterations is not a valid usage, I think there are more clever ways to lower :dynamic to make it work nicely with non-uniform workloads. Then, the users don't have to choose between (say) :uniform and :loadbalancing schedulers.

test/threads_exec.jl

carstenbauer · 2022-02-08T22:55:28Z

The new documentation of the scheduling options confuses me a bit. It starts with

Except :static scheduling

but it's not clear (at least not to me) whether everything that follows (the big paragraph and the examples) doesn't apply to :static scheduling or only the initial parts of it.

For example, the rather general statement

The scheduling option is a hint. The loop body code (including any code
transitively called from it) must not make assumptions about the distribution of iterations
to tasks or the worker thread in which they are executed.

seems to suggest that even :static might be overruled internally (i.e. isn't safe to rely upon). Perhaps we should clarify this a bit better? (or maybe I'm misunderstanding something?)

IanButterworth · 2022-02-09T00:28:47Z

I think the key merge blocker now is whether this is called :dynamic or :uniform or something else?

Is there a multithreading call before 1.8 feature freeze? If there is, perhaps it could be decided there?
Or does someone feel it can be concluded now?

Perhaps once that's decided this can be merged, and docs be improved after

tkf · 2022-02-09T02:42:55Z

:dynamic or :uniform or something else?

Just FYI I support :dynamic now. The wording in my reply to Jameson sounded I that I still want :uniform and so I tweaked it now.

IanButterworth · 2022-02-09T02:53:40Z

Great. So it seems we stick with :dynamic and there might be room for docs improvement in a future PR?

Please feel free to merge if that sounds good.

tkf · 2022-02-09T03:43:43Z

Sounds good to me but I think @vtjnash is not very happy with the name :dynamic?

IanButterworth · 2022-02-12T01:36:07Z

IMO there could be further optimizations of the :dynamic (default) schedule in the future, so perhaps this is a good starting point? The idea agreed in triage being that this is followed up by a PR to make :dynamic the default.

If that plan doesn't sound good, it would be good to elaborate on the blocking issues given the 1.8 feature freeze is looming.

tkf · 2022-02-12T02:16:18Z

Yeah, I think we spent enough time on tweaking spec/implementation and gathering feedback. I think we peppered enough caution (hopefully) to make future improvements possible. I'm merging this now. (tester_win32 failure seems irrelevant)

…43919) Co-authored-by: Julian Samaroo <[email protected]> Co-authored-by: Takafumi Arakaki <[email protected]> Co-authored-by: Valentin Churavy <[email protected]>

IanButterworth requested review from tkf, vtjnash and jpsamaroo January 25, 2022 03:09

IanButterworth added the multithreading Base.Threads and related functionality label Jan 25, 2022

add :dynamic scheduling option for Threads.@threads

9619413

IanButterworth force-pushed the ib/threads_dynamic branch from 03255f8 to 9619413 Compare January 25, 2022 03:19

tkf reviewed Jan 25, 2022

View reviewed changes

base/threadingconstructs.jl Outdated Show resolved Hide resolved

IanButterworth added 2 commits January 24, 2022 22:54

fix docstring default explanation

6428d6d

tidy tests

381fef0

tkf reviewed Jan 25, 2022

View reviewed changes

add expanded explanation

5cb286a

Co-Authored-By: Takafumi Arakaki <[email protected]>

jpsamaroo approved these changes Jan 25, 2022

View reviewed changes

IanButterworth and others added 2 commits January 25, 2022 13:55

Apply suggestions from code review

be14f3c

Co-authored-by: Julian Samaroo <[email protected]>

include busywait def in example

a0e17e9

tkf reviewed Jan 27, 2022

View reviewed changes

base/threadingconstructs.jl Outdated Show resolved Hide resolved

IanButterworth and others added 3 commits January 27, 2022 01:29

tweak & expand tests

6ba0323

fix dynamic range splits

65c7ef2

Co-Authored-By: Takafumi Arakaki <[email protected]>

Merge branch 'master' into ib/threads_dynamic

5202378

IanButterworth force-pushed the ib/threads_dynamic branch from 61bb51a to 5202378 Compare January 27, 2022 06:51

This comment has been minimized.

Sign in to view

tkf mentioned this pull request Jan 27, 2022

weird interaction between sleep function and Threads #43952

Open

This comment has been minimized.

Sign in to view

tkf reviewed Jan 27, 2022

View reviewed changes

base/threadingconstructs.jl Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

Update NEWS.md

6a096fd

tkf reviewed Feb 5, 2022

View reviewed changes

Apply suggestions from code review

4d69729

Co-authored-by: Takafumi Arakaki <[email protected]>

tkf approved these changes Feb 5, 2022

View reviewed changes

vtjnash reviewed Feb 5, 2022

View reviewed changes

tkf reviewed Feb 5, 2022

View reviewed changes

test/threads_exec.jl Outdated Show resolved Hide resolved

Fix a syntax error

b17dc91

tkf closed this Feb 5, 2022

tkf reopened this Feb 5, 2022

This comment was marked as resolved.

Sign in to view

IanButterworth mentioned this pull request Feb 8, 2022

add progress bar to allocation preparation JuliaPerf/PProf.jl#53

Merged

vchuravy approved these changes Feb 12, 2022

View reviewed changes

tkf merged commit e376fcc into JuliaLang:master Feb 12, 2022

IanButterworth deleted the ib/threads_dynamic branch February 12, 2022 02:17

This was referenced Feb 12, 2022

Make @threads :dynamic default #44136

Merged

Add @threads :dynamic #40908

Closed

tkf mentioned this pull request Feb 14, 2022

Clarify the behavior of @threads for #44168

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a `:dynamic` scheduling option for `Threads.@threads` #43919

Add a `:dynamic` scheduling option for `Threads.@threads` #43919

IanButterworth commented Jan 25, 2022 •

edited

Loading

tkf Jan 25, 2022 •

edited

Loading

IanButterworth Jan 25, 2022

tkf Jan 25, 2022

IanButterworth Jan 25, 2022 •

edited

Loading

antoine-levitt Jan 25, 2022

tkf Jan 25, 2022

antoine-levitt Jan 25, 2022

tkf Jan 25, 2022 •

edited

Loading

antoine-levitt commented Jan 25, 2022

tkf commented Jan 25, 2022

antoine-levitt commented Jan 25, 2022

tkf commented Jan 25, 2022 •

edited

Loading

antoine-levitt commented Jan 25, 2022

jpsamaroo left a comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

IanButterworth commented Feb 4, 2022 •

edited

Loading

tkf left a comment

vtjnash left a comment

tkf commented Feb 5, 2022 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

carstenbauer commented Feb 8, 2022 •

edited

Loading

IanButterworth commented Feb 9, 2022

tkf commented Feb 9, 2022

IanButterworth commented Feb 9, 2022

tkf commented Feb 9, 2022 •

edited

Loading

IanButterworth commented Feb 12, 2022

tkf commented Feb 12, 2022

Add a :dynamic scheduling option for Threads.@threads #43919

Add a :dynamic scheduling option for Threads.@threads #43919

Conversation

IanButterworth commented Jan 25, 2022 • edited Loading

tkf Jan 25, 2022 • edited Loading

Choose a reason for hiding this comment

IanButterworth Jan 25, 2022

Choose a reason for hiding this comment

tkf Jan 25, 2022

Choose a reason for hiding this comment

IanButterworth Jan 25, 2022 • edited Loading

Choose a reason for hiding this comment

antoine-levitt Jan 25, 2022

Choose a reason for hiding this comment

tkf Jan 25, 2022

Choose a reason for hiding this comment

antoine-levitt Jan 25, 2022

Choose a reason for hiding this comment

tkf Jan 25, 2022 • edited Loading

Choose a reason for hiding this comment

antoine-levitt commented Jan 25, 2022

tkf commented Jan 25, 2022

antoine-levitt commented Jan 25, 2022

tkf commented Jan 25, 2022 • edited Loading

antoine-levitt commented Jan 25, 2022

jpsamaroo left a comment

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

IanButterworth commented Feb 4, 2022 • edited Loading

tkf left a comment

Choose a reason for hiding this comment

vtjnash left a comment

Choose a reason for hiding this comment

tkf commented Feb 5, 2022 • edited Loading

This comment was marked as resolved.

This comment was marked as resolved.

carstenbauer commented Feb 8, 2022 • edited Loading

IanButterworth commented Feb 9, 2022

tkf commented Feb 9, 2022

IanButterworth commented Feb 9, 2022

tkf commented Feb 9, 2022 • edited Loading

IanButterworth commented Feb 12, 2022

tkf commented Feb 12, 2022

Add a `:dynamic` scheduling option for `Threads.@threads` #43919

Add a `:dynamic` scheduling option for `Threads.@threads` #43919

IanButterworth commented Jan 25, 2022 •

edited

Loading

tkf Jan 25, 2022 •

edited

Loading

IanButterworth Jan 25, 2022 •

edited

Loading

tkf Jan 25, 2022 •

edited

Loading

tkf commented Jan 25, 2022 •

edited

Loading

IanButterworth commented Feb 4, 2022 •

edited

Loading

tkf commented Feb 5, 2022 •

edited

Loading

carstenbauer commented Feb 8, 2022 •

edited

Loading

tkf commented Feb 9, 2022 •

edited

Loading