Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multithreading to LHCoptim #17

Merged
merged 7 commits into from
Mar 29, 2021
Merged

Add multithreading to LHCoptim #17

merged 7 commits into from
Mar 29, 2021

Conversation

eljungsk
Copy link
Contributor

@eljungsk eljungsk commented Mar 14, 2021

Hej!

I had a stab at adding multithreading to LHCoptim. Profiling showed that most of the time was spent in _AudzeEglaisDist followed by _fixedcross!, so these are the two portions of LHCoptim that are threaded.

It's worth to note that one global RNG is still used (instead of each thread having its own), but this doesn't ensure consistent results between different number of threads, nor repeatable results on a number of threads > 1.

To make threading opt-in even if multiple threads are available, there is a macro called @maybe_threaded which unfortunately uses a non-exported function from Base.Threads, but this was the only way I could make it work since Threads.@threads doesn't compose well within another macro.

Performance wise, I have only been able to test on my 8 year old laptop with two glorious cores, but the results look quite promising:

julia> @benchmark LHCoptim(100,3,1000, threading = false)
BenchmarkTools.Trial: 
  memory estimate:  578.03 MiB
  allocs estimate:  1242768
  --------------
  minimum time:     3.639 s (0.83% GC)
  median time:      3.672 s (0.83% GC)
  mean time:        3.672 s (0.83% GC)
  maximum time:     3.705 s (0.84% GC)
  --------------
  samples:          2
  evals/sample:     1

julia> @benchmark LHCoptim(100,3,1000, threading = true)
BenchmarkTools.Trial: 
  memory estimate:  566.63 MiB
  allocs estimate:  727455
  --------------
  minimum time:     2.121 s (1.28% GC)
  median time:      2.128 s (1.31% GC)
  mean time:        2.133 s (1.33% GC)
  maximum time:     2.148 s (1.30% GC)
  --------------
  samples:          3
  evals/sample:     1

It might be a good idea to benchmark on a beefier machine too before merging :)

/Emil

@eljungsk
Copy link
Contributor Author

Ok, so this seems to only work on Julia > v1.5 due to changes in the threading.

@eljungsk
Copy link
Contributor Author

eljungsk commented Mar 16, 2021

Test are now passning for all versions. (Apart from AppVeyor which seems to be broken)

@MrUrq
Copy link
Owner

MrUrq commented Mar 19, 2021

Hej!

Thanks, this looks great. I tested the new implementation on 1, 2, 4, 6, ..., 18, 20 threads on a machine with 20 physical threads and got these result.
@benchmark LHCoptim(100,3,1000; threading=true) seconds=60

performance

I also tried it on a 10x larger plan @benchmark LHCoptim(1000,3,100; threading=true) seconds=60 for 1, 2, 4, 12 and 20 threads showing performance increases all the way up to 20 threads (10x improvement @ 20 threads). I also compared it to the previous version with no performance regressions compared to using 1 thread.

I think we can drop support for Julia 1.0 in favour of making it easier to maintain in the future, what do you think?

Thanks a lot for this PR great performance increase, especially on large plans!

@eljungsk
Copy link
Contributor Author

Great to see that it scales reasonably well, at least for larger plans!

It might be that other parts of the optimization loop are the bottleneck for high thread counts on large plans, and I wouldn't be surprised if it's possible to squeeze out a bit more performance by threading these parts as well. On the other hand, I guess that doing so would worsen the performance on smaller plans as we introduce more threading overhead, so there is probably a trade-off for which type of plan to prioritize. But that should probably be the scope of a possible future PR :)

I don't have any strong opinions on whether to drop support for 1.0 or not. I guess it would make sense to keep it for now and drop it when the next LTS is out?

@MrUrq
Copy link
Owner

MrUrq commented Mar 29, 2021

Could do some further benchmarking with the threading in-place and see what is the next low-hanging fruit.

I will keep 1.0 support for now and merge it in. Thanks!

@MrUrq MrUrq merged commit b3d5ce8 into MrUrq:master Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants