Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated multi-threading branch #1628

Merged
merged 12 commits into from
Aug 26, 2021
Merged

updated multi-threading branch #1628

merged 12 commits into from
Aug 26, 2021

Conversation

stevengj
Copy link
Collaborator

@stevengj stevengj commented Jun 23, 2021

WIP to rebase @smartalecH's multithreading branch to fix #1627 and fix #228.

Includes some additional thread-safety fixes (to parallelize the DFT updates, and to avoid calling back to Python or other user callbacks from multiple threads).

src/meep/vec.hpp Outdated Show resolved Hide resolved
@stevengj
Copy link
Collaborator Author

The tests don't currently exercise the openmp code (we need to configure with --enable-openmp and set OMP_NUM_THREADS to enable this functionality), but at least they are green, so this hopefully isn't harming anything.

It would be good to do a quick performance benchmark without openmp on this branch, to make sure that @smartalecH's loop rewriting didn't hurt the serial performance.

@stevengj
Copy link
Collaborator Author

We probably want to parallelize the LOOP_OVER_IVECS in dft_chunk::update_dft as well, since that is often a significant fraction of the runtime.

src/fields.cpp Outdated Show resolved Hide resolved
src/structure.cpp Outdated Show resolved Hide resolved
src/structure.cpp Outdated Show resolved Hide resolved
@@ -0,0 +1,80 @@
/*
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smartalecH, should this file be included in make check?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this was just for my personal debugging. Didn't realize it was added.

@stevengj stevengj marked this pull request as ready for review July 21, 2021 02:56
@codecov-commenter
Copy link

codecov-commenter commented Jul 21, 2021

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.46%. Comparing base (21c94c0) to head (79ffca8).
Report is 396 commits behind head on master.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1628   +/-   ##
=======================================
  Coverage   73.46%   73.46%           
=======================================
  Files          13       13           
  Lines        4557     4557           
=======================================
  Hits         3348     3348           
  Misses       1209     1209           

@oskooi
Copy link
Collaborator

oskooi commented Aug 23, 2021

This branch should be rebased after #1737 is merged first.

@smartalecH smartalecH mentioned this pull request Aug 23, 2021
@smartalecH
Copy link
Collaborator

All of the LOOP macros that involve python callbacks do not currently use the multithreading implementation.

There's a nifty swig feature, -threads, that allows us to temporarily release the python GIL, call whatever python function we need (in parallel), and then reenable the GIL. Many of the callbacks aren't required to be sequential -- rather they are just evaluating some user-defined function (e.g. a permittivity distribution). So this would be a perfect use case (and consequently, speed up the simulation init step quite a bit).

@stevengj stevengj merged commit 3eabcef into master Aug 26, 2021
@stevengj stevengj deleted the sgj/multithreading branch August 26, 2021 14:55
@oskooi
Copy link
Collaborator

oskooi commented Aug 26, 2021

After this PR was merged earlier today, I noticed that if building Meep from the master branch (commit: 74cb6dd) using --with-openmp to enable multithreading from this branch, if you run e.g. any of the unit tests in python/tests and the environment variable OMP_NUM_THREADS is not set, the timestepping rate slows down by nearly two orders of magnitude. It's only when this environment variable is set (e.g., $ OMP_NUM_THREADS=2 mpirun -np 2 python test_force.py) that the timestepping rate is what it should be. Perhaps we need to check if OMP_NUM_THREADS has not been set and if so set it to a default value of 1?

@oskooi
Copy link
Collaborator

oskooi commented Aug 26, 2021

According to this page:

If you forget to set OMP_NUM_THREADS to any value, the default value of your cluster environment will be used. In many cases, the default is 1, so that your program is executed serially. If this envvar is not set at all the OpenMP run time may also deciede to use up all cores of your computer which must not always be the expected outcome, so it is a good idea always to set a meaningful value.

@stevengj
Copy link
Collaborator Author

Maybe we should call omp_set_num_threads(1) somewhere if the OMP_NUM_THREADS environment variable is not set.

bencbartlett pushed a commit to bencbartlett/meep that referenced this pull request Sep 9, 2021
* multithreading branch by smartalecH

* compilation fixes

* rm pragma unroll statements, which shouldn't be needed

* parallelize update_dft

* use MPI_Init_thread with openmp

* fix idx_dft counter for parallel loop

* IVEC_LOOP_COUNTER macro

* disable non-thread-safe PLOOP for epsilon averaging

* simplifications, fixes

* make sure step_generic_stride1.cpp is re-generated

* nevermind, step_generic.cpp dependency is sufficient

* comment formatting

Co-authored-by: Alec Hammond <[email protected]>
mawc2019 pushed a commit to mawc2019/meep that referenced this pull request Nov 3, 2021
* multithreading branch by smartalecH

* compilation fixes

* rm pragma unroll statements, which shouldn't be needed

* parallelize update_dft

* use MPI_Init_thread with openmp

* fix idx_dft counter for parallel loop

* IVEC_LOOP_COUNTER macro

* disable non-thread-safe PLOOP for epsilon averaging

* simplifications, fixes

* make sure step_generic_stride1.cpp is re-generated

* nevermind, step_generic.cpp dependency is sufficient

* comment formatting

Co-authored-by: Alec Hammond <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hybrid OpenMP/MPI OpenMP parallelism
4 participants