Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a :dynamic scheduling option for Threads.@threads #43919

Merged
merged 22 commits into from
Feb 12, 2022

Conversation

IanButterworth
Copy link
Member

@IanButterworth IanButterworth commented Jan 25, 2022

Ref #35646 - the idea here is largely proposed there. This is perhaps mainly a question of whether now is the right time to do this?


Enables a more composable variant of Threads.@threads via a new :dynamic schedule strategy.

An example where busywait is a non-yielding timed loop that waits for a number of seconds.
Note that :static is the current default where no schedule is specified:

julia> @time begin
            Threads.@spawn busywait(5)
            Threads.@threads :static for i in 1:Threads.nthreads()
                busywait(1)
            end
        end
6.003001 seconds (16.33 k allocations: 899.255 KiB, 0.25% compilation time)

julia> @time begin
            Threads.@spawn busywait(5)
            Threads.@threads :dynamic for i in 1:Threads.nthreads()
                busywait(1)
            end
        end
2.012056 seconds (16.05 k allocations: 883.919 KiB, 0.66% compilation time)

Unlike :static the :dynamic strategy can be nested.

julia> @time begin
        Threads.@spawn busywait(5)
        Threads.@threads :dynamic for i in 1:Threads.nthreads()
            Threads.@threads :dynamic for j in 1:Threads.nthreads()
                busywait(1)
            end
        end
    end
 17.023767 seconds (61.46 k allocations: 3.402 MiB, 0.14% compilation time)

julia> Threads.nthreads()
16

@vtjnash: A thought was that the threaded region didn't need to be marked with this strategy, is that correct?

Co-authors: @tkf @jpsamaroo

@IanButterworth IanButterworth added the multithreading Base.Threads and related functionality label Jan 25, 2022
base/threadingconstructs.jl Outdated Show resolved Hide resolved
Comment on lines 120 to 123
- `:dynamic` is like `:static` except the tasks are assigned to threads dynamically,
allowing more flexible scheduling if other tasks are active on other threads.
Specifying `:dynamic` is allowed from inside another `@threads` loop and from
threads other than 1.
Copy link
Member

@tkf tkf Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest avoiding specifying the behavior as much as possible, even though it may be a losing battle against Hyrum's law.

Before mentioning schedules, I think it'd be a good idea to explicitly say what is OK and what is not. Maybe something like:

Except :static scheduling, how the iterations are assigned to tasks, and how the tasks are assigned to the worker threads are undefined. The exact assignments can be different for each execution. The scheduling option should be considered as a hint. The loop body code (including any code transitively called from it) must not assume the task and worker thread in which they are executed. The loop body code for each iteration must be able to make forward progress and be free from data races independent of the state of other iterations. As such, synchronizations across iterations may invoke deadlock while unsynchronized access to overlapping locations can invoke data races and hence undefined behaviors.

For example, the above conditions imply that:

  • The lock taken in an iteration must be released within the same iteration.
  • Avoid communication between iterations using, e.g., Channels.
  • Write only to locations not shared across iterations (unless lock or atomic operation is used).

Furthermore, even though lock and atomic can be useful sometimes, it is often better to avoid them for performance.

(We can maybe borrow some ideas from the C++ executor definition to make the wording more precise.)

After this, I think we can simply mention

Suggested change
- `:dynamic` is like `:static` except the tasks are assigned to threads dynamically,
allowing more flexible scheduling if other tasks are active on other threads.
Specifying `:dynamic` is allowed from inside another `@threads` loop and from
threads other than 1.
- `:dynamic` tries to schedule iterations dynamically to available worker threads,
assuming that the workload for each iteration is reasonably uniform.

...which makes me wonder if :uniform is a better wording? Users may expect some kind of load-balancing for :dynamic scheduling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I've pushed your suggestions.

I think :dynamic makes a bit more sense as it's a description of the scheduling policy, not the underlying workload, which :uniform refers to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was more like, since possible additional scheduling will likely be all "dynamic" in some sense, it'd be better to emphasize other aspects. So yeah, it does focus on the underlying workload.

For example, it's reasonable to add :loadbalancing that splits the iteration space to nthreads() * 8 chunks (say). But both :dynamic and :loadbalancing are "dynamic."

Copy link
Member Author

@IanButterworth IanButterworth Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense I think. If :loadbalaning was added the schedules could be summarized in the docstring as

- static

Dynamic options:
- uniform
- loadbalancing

Though, I think an issue is that in code @threads :uniform for doesn't alone speak to dynamic behavior, but perhaps that's ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this option becomes the "go to option" for people who want to do threading (which it should probably be if I understand correctly?), calling it uniform is a bit confusing (unless it's the default and can be omitted)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO "go to option" is FLoops.jl but it's really hard to tell people to not think "it's in Base so it must be good" (but yeah, of course, I would say this 😆). But FLoops.jl is not ready for Base and it's kind of stupid to not harvest really low-hanging fruit like this. That's why I thought that a patch like this is good.

Anyway, yes, I agree :uniform doesn't shout "USE ME!". Also, it may be reasonable to argue that even :uniform is specifying too much. Maybe we should just have :dynamic. It let us, in the future, internally use something similar to :loadbalancing if the compiler sees some branches and something similar to :uniform otherwise. Or maybe we can do some clever run-time trick.

Then, maybe we should simply write:

New code using @threads, especially those that can be called from library API, should try to use :dynamic. It schedules iterations dynamically to available worker threads while minimizing overhead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale for not making it the default? Is it safety considerations? If yes, might be good to make it explicit, eg "The default is :static, to ensure safety, but code with independent iterations should use :dynamic which is usually more performant."

Copy link
Member

@tkf tkf Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:static is not the default, both before and after this PR. It also does not try to ensure "safety" for arbitrary code. The purpose of :static is to keep the pre-1.5 behavior in order to help the old programs that assumed the way @threads scheduled iterations in the previous versions.

Co-Authored-By: Takafumi Arakaki <[email protected]>
@antoine-levitt
Copy link
Contributor

Thanks, it looks like this is a great change for users confused by threads like me. I think the docs would benefit from being expanded a bit. For instance, is the dynamic schedule equivalent to a for loop and a spawn inside each iteration? I also think the example would be clearer with an iteration dependent wait time instead of an additional spawn (the link between thrreads and spawn is not clear to me, nor I would imagine to many people). Also the reference to workload being uniform is confusing to me : in the limit where threading overhead is negligible compared to per iteration cost, this strategy will be optimal irrespective of uniformity, right?

@tkf
Copy link
Member

tkf commented Jan 25, 2022

For instance, is the dynamic schedule equivalent to a for loop and a spawn inside each iteration?

This is not the case. This is also exactly what I wanted to clarify in #43919 (comment) (now already in the docstring). If you spawn a task in each iteration, caveats like "Avoid communication between iterations using, e.g., Channels." don't apply. To make it more concrete, consider

ch = Channel()
@threads for i in 1:2
    if i == 1
        take!(ch)
    else
        put!(ch, i)
    end
end

If each iteration is a task on its own, there's no deadlock. However, if @threads does some clever thing and decide to use just one task, it's equivalent to the serial loop. It's easy to see that the above code deadlocks.

Also the reference to workload being uniform is confusing to me : in the limit where threading overhead is negligible compared to per iteration cost, this strategy will be optimal irrespective of uniformity, right?

The distribution of the workload does play a crucial role. As a rather extreme case, consider for i in 1:10; sleep(exp(i)); end and assume that sleep(t) really is a compute-intensive function that takes about t seconds to complete. Also, assume that I don't have more than 10 CPUs and so I set nthreads < 10. Every iteration takes more than e seconds. Compared to this, task spawn+sync overhead is nothing. But I do want to spawn more than nthreads tasks in this case because the last few tasks dominate the whole run-time.

@antoine-levitt
Copy link
Contributor

I think I'm even more confused than before now, and have no clear idea of how all the different schedules work, sorry if I'm dense. I understand the reluctance to document them in detail because they are implementation details that might change in the future, but since multithreading is, after all, an optimization, it feels important for users to have some idea of what's going on behind the scenes (and tell them it might change in the future if necessary). If I understand correctly, static splits the iteration range in nthreads chunks, then each thread treats these chunks. for i =1:N @spawn begin ... end end creates N tasks which form a pool, and the nthreads take work from there. So what does dynamic do if not these two?

@tkf
Copy link
Member

tkf commented Jan 25, 2022

I agree that it is reasonable to explain the performance characteristics if we have multiple options. But we only have one recommended option aka :dynamic. So, I don't think there is a need for explaining the difference, at least now.

But, if you think #43919 (comment) is not useful for writing correct parallel loop, please do bring up the unclear points.

As for why something like :dynamic is not the default, it is presumably because triage wanted to keep the behavior of @threads in #35646 similar to the one prior to the PR. @JeffBezanson said "in a later version we can change the default schedule" (which was pre-1.5). Is 1.8 sufficiently later?

Note: I just remember that there were similar PRs like #35003 and #32477. I also suggested it in #35646 (comment) but it was not reflected in #35646.

@antoine-levitt
Copy link
Contributor

Thanks for clarifying, I was definitely confused about this. Currently, the docs doesn't explicitly say that Threads.@threads is the same as Threads.@threads :static but since there's only one schedule I feel that it heavily implies it. In that case, can the old behavior be made to be a schedule? That would be :static, :old (hopefully with a better name), and :dynamic.

Copy link
Member

@jpsamaroo jpsamaroo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together, LGTM!

base/threadingconstructs.jl Outdated Show resolved Hide resolved
base/threadingconstructs.jl Outdated Show resolved Hide resolved
base/threadingconstructs.jl Outdated Show resolved Hide resolved
base/threadingconstructs.jl Outdated Show resolved Hide resolved
base/threadingconstructs.jl Outdated Show resolved Hide resolved
base/threadingconstructs.jl Outdated Show resolved Hide resolved
base/threadingconstructs.jl Outdated Show resolved Hide resolved
@IanButterworth

This comment has been minimized.

@tkf

This comment has been minimized.

base/threadingconstructs.jl Outdated Show resolved Hide resolved
@IanButterworth

This comment has been minimized.

@IanButterworth

This comment has been minimized.

@IanButterworth
Copy link
Member Author

IanButterworth commented Feb 4, 2022

Just review requests for final approval. (also just added to NEWS)

Copy link
Member

@tkf tkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added some more nitpicks. I think it's good to go other than these points.

NEWS.md Outdated Show resolved Hide resolved
base/threadingconstructs.jl Outdated Show resolved Hide resolved
base/threadingconstructs.jl Outdated Show resolved Hide resolved
test/threads_exec.jl Outdated Show resolved Hide resolved
test/threads_exec.jl Outdated Show resolved Hide resolved
test/threads_exec.jl Outdated Show resolved Hide resolved
test/threads_exec.jl Outdated Show resolved Hide resolved
test/threads_exec.jl Outdated Show resolved Hide resolved
Co-authored-by: Takafumi Arakaki <[email protected]>
Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming that the workload for each iteration is uniform

This is not really what I would have assumed "dynamic" means. I would have guessed this schedule was more closely syntactic sugar for Threads.foreach, which currently has generally had the more optimal schedules than @threads or this PR.

@tkf
Copy link
Member

tkf commented Feb 5, 2022

Yeah, that's why I initially wanted to call it :uniform.

:dynamic will schedule iterations dynamically to available worker threads. Currently, it assumes that the workload for each iteration is uniform.

Since we established that synchronization across iterations is not a valid usage, I think there are more clever ways to lower :dynamic to make it work nicely with non-uniform workloads. Then, the users don't have to choose between (say) :uniform and :loadbalancing schedulers.

test/threads_exec.jl Outdated Show resolved Hide resolved
@tkf tkf closed this Feb 5, 2022
@tkf tkf reopened this Feb 5, 2022
@tkf

This comment was marked as resolved.

@giordano

This comment was marked as resolved.

@carstenbauer
Copy link
Member

carstenbauer commented Feb 8, 2022

The new documentation of the scheduling options confuses me a bit. It starts with

Except :static scheduling

but it's not clear (at least not to me) whether everything that follows (the big paragraph and the examples) doesn't apply to :static scheduling or only the initial parts of it.

For example, the rather general statement

The scheduling option is a hint. The loop body code (including any code
transitively called from it) must not make assumptions about the distribution of iterations
to tasks or the worker thread in which they are executed.

seems to suggest that even :static might be overruled internally (i.e. isn't safe to rely upon). Perhaps we should clarify this a bit better? (or maybe I'm misunderstanding something?)

@IanButterworth
Copy link
Member Author

I think the key merge blocker now is whether this is called :dynamic or :uniform or something else?

Is there a multithreading call before 1.8 feature freeze? If there is, perhaps it could be decided there?
Or does someone feel it can be concluded now?

Perhaps once that's decided this can be merged, and docs be improved after

@tkf
Copy link
Member

tkf commented Feb 9, 2022

:dynamic or :uniform or something else?

Just FYI I support :dynamic now. The wording in my reply to Jameson sounded I that I still want :uniform and so I tweaked it now.

@IanButterworth
Copy link
Member Author

Great. So it seems we stick with :dynamic and there might be room for docs improvement in a future PR?

Please feel free to merge if that sounds good.

@tkf
Copy link
Member

tkf commented Feb 9, 2022

Sounds good to me but I think @vtjnash is not very happy with the name :dynamic?

@IanButterworth
Copy link
Member Author

IMO there could be further optimizations of the :dynamic (default) schedule in the future, so perhaps this is a good starting point? The idea agreed in triage being that this is followed up by a PR to make :dynamic the default.

If that plan doesn't sound good, it would be good to elaborate on the blocking issues given the 1.8 feature freeze is looming.

@tkf
Copy link
Member

tkf commented Feb 12, 2022

Yeah, I think we spent enough time on tweaking spec/implementation and gathering feedback. I think we peppered enough caution (hopefully) to make future improvements possible. I'm merging this now. (tester_win32 failure seems irrelevant)

@tkf tkf merged commit e376fcc into JuliaLang:master Feb 12, 2022
@IanButterworth IanButterworth deleted the ib/threads_dynamic branch February 12, 2022 02:17
This was referenced Feb 12, 2022
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this pull request Feb 17, 2022
…43919)

Co-authored-by: Julian Samaroo <[email protected]>
Co-authored-by: Takafumi Arakaki <[email protected]>
Co-authored-by: Valentin Churavy <[email protected]>
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
…43919)

Co-authored-by: Julian Samaroo <[email protected]>
Co-authored-by: Takafumi Arakaki <[email protected]>
Co-authored-by: Valentin Churavy <[email protected]>
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
…43919)

Co-authored-by: Julian Samaroo <[email protected]>
Co-authored-by: Takafumi Arakaki <[email protected]>
Co-authored-by: Valentin Churavy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants