Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of chunk balancer feature #1775

Merged
merged 10 commits into from
Oct 9, 2021

Conversation

bencbartlett
Copy link
Contributor

CONTEXT: This feature uses empirical data about the timings of each MPI process to adjust the chunk sizes to achieve optimal load balancing for repeated simulations, such as iterations in an optimization run. Processes which are consistently slow and overworked will have their chunk sizes reduced for the next iteration of an optimization run, and underworked processes will handle larger chunks. This approach has the advantage that it implicitly incorporates variable load between different machines running the MPI processes.

SCOPE:

  • Added abstract class for chunk balancer
  • Added DefaultChunkBalancer implementation, which adjusts chunk sizes according to per-process working time (ignoring all-all comms) while maintaining previous split directions
  • Wrote unit tests for DefaultChunkBalancer to check for improvement in load-balancing, convergence to a load-balanced state, and that split_pos values are adjusted correctly for each iteration.
  • Wrote unit tests for the MockSimulation class used by the DefaultChunkBalancerTests
  • Added a binary_partition_utils.py library which includes lots of tree traversal algorithms which are useful to have for the chunk balancer
  • Wrote unit tests for binary_partition_utils.py

CONTEXT: This feature uses empirical data about the timings of each MPI process to adjust the chunk sizes to achieve optimal load balancing for repeated simulations, such as iterations in an optimization run. Processes which are consistently slow and overworked will have their chunk sizes reduced for the next iteration of an optimization run, and underworked processes will handle larger chunks. This approach has the advantage that it implicitly incorporates variable load between different machines running the MPI processes.

SCOPE:
- Added abstract class for chunk balancer
- Added DefaultChunkBalancer implementation, which adjusts chunk sizes according to per-process working time (ignoring all-all comms) while maintaining previous split directions
- Wrote unit tests for DefaultChunkBalancer to check for improvement in load-balancing, convergence to a load-balanced state, and that split_pos values are adjusted correctly for each iteration.
- Wrote unit tests for the MockSimulation class used by the DefaultChunkBalancerTests
- Added a binary_partition_utils.py library which includes lots of tree traversal algorithms which are useful to have for the chunk balancer
- Wrote unit tests for binary_partition_utils.py
@codecov-commenter
Copy link

codecov-commenter commented Sep 29, 2021

Codecov Report

Merging #1775 (d89104b) into master (d41b0a4) will increase coverage by 0.79%.
The diff coverage is 92.27%.

@@            Coverage Diff             @@
##           master    #1775      +/-   ##
==========================================
+ Coverage   74.41%   75.20%   +0.79%     
==========================================
  Files          13       16       +3     
  Lines        4581     4796     +215     
==========================================
+ Hits         3409     3607     +198     
- Misses       1172     1189      +17     
Impacted Files Coverage Δ
python/chunk_balancer.py 89.02% <89.02%> (ø)
python/timing_measurements.py 92.85% <92.85%> (ø)
python/binary_partition_utils.py 95.18% <95.18%> (ø)
python/simulation.py 76.72% <0.00%> (-0.06%) ⬇️
python/adjoint/objective.py 91.76% <0.00%> (+0.85%) ⬆️

@bencbartlett bencbartlett marked this pull request as ready for review September 30, 2021 00:10
doc/docs/Parallel_Meep.md Outdated Show resolved Hide resolved
doc/docs/Parallel_Meep.md Outdated Show resolved Hide resolved
doc/docs/Parallel_Meep.md Outdated Show resolved Hide resolved
@stevengj
Copy link
Collaborator

stevengj commented Oct 8, 2021

Failing tests?

@bencbartlett
Copy link
Contributor Author

Sorry about that, made a minor change that broke a test on test_timing_measurements.py. All fixed now.

@stevengj stevengj merged commit d12e544 into NanoComp:master Oct 9, 2021
mawc2019 pushed a commit to mawc2019/meep that referenced this pull request Nov 3, 2021
* Implementation of chunk balancer feature

CONTEXT: This feature uses empirical data about the timings of each MPI process to adjust the chunk sizes to achieve optimal load balancing for repeated simulations, such as iterations in an optimization run. Processes which are consistently slow and overworked will have their chunk sizes reduced for the next iteration of an optimization run, and underworked processes will handle larger chunks. This approach has the advantage that it implicitly incorporates variable load between different machines running the MPI processes.

SCOPE:
- Added abstract class for chunk balancer
- Added DefaultChunkBalancer implementation, which adjusts chunk sizes according to per-process working time (ignoring all-all comms) while maintaining previous split directions
- Wrote unit tests for DefaultChunkBalancer to check for improvement in load-balancing, convergence to a load-balanced state, and that split_pos values are adjusted correctly for each iteration.
- Wrote unit tests for the MockSimulation class used by the DefaultChunkBalancerTests
- Added a binary_partition_utils.py library which includes lots of tree traversal algorithms which are useful to have for the chunk balancer
- Wrote unit tests for binary_partition_utils.py

* fixed wrong test filename in Makefile.am

* add new modules to Makefile.am to be included in __init__.py

* bugfix in tests

* more test bugfixes

* avoid merge conflict

* refactored class names, added documentation for adjusting chunk layouts between runs of a program

* update to Parallel_Meep.md for missing chunk layout file

* test bugfix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants