Implementation of chunk balancer feature #1775

bencbartlett · 2021-09-29T22:37:40Z

CONTEXT: This feature uses empirical data about the timings of each MPI process to adjust the chunk sizes to achieve optimal load balancing for repeated simulations, such as iterations in an optimization run. Processes which are consistently slow and overworked will have their chunk sizes reduced for the next iteration of an optimization run, and underworked processes will handle larger chunks. This approach has the advantage that it implicitly incorporates variable load between different machines running the MPI processes.

SCOPE:

Added abstract class for chunk balancer
Added DefaultChunkBalancer implementation, which adjusts chunk sizes according to per-process working time (ignoring all-all comms) while maintaining previous split directions
Wrote unit tests for DefaultChunkBalancer to check for improvement in load-balancing, convergence to a load-balanced state, and that split_pos values are adjusted correctly for each iteration.
Wrote unit tests for the MockSimulation class used by the DefaultChunkBalancerTests
Added a binary_partition_utils.py library which includes lots of tree traversal algorithms which are useful to have for the chunk balancer
Wrote unit tests for binary_partition_utils.py

CONTEXT: This feature uses empirical data about the timings of each MPI process to adjust the chunk sizes to achieve optimal load balancing for repeated simulations, such as iterations in an optimization run. Processes which are consistently slow and overworked will have their chunk sizes reduced for the next iteration of an optimization run, and underworked processes will handle larger chunks. This approach has the advantage that it implicitly incorporates variable load between different machines running the MPI processes. SCOPE: - Added abstract class for chunk balancer - Added DefaultChunkBalancer implementation, which adjusts chunk sizes according to per-process working time (ignoring all-all comms) while maintaining previous split directions - Wrote unit tests for DefaultChunkBalancer to check for improvement in load-balancing, convergence to a load-balanced state, and that split_pos values are adjusted correctly for each iteration. - Wrote unit tests for the MockSimulation class used by the DefaultChunkBalancerTests - Added a binary_partition_utils.py library which includes lots of tree traversal algorithms which are useful to have for the chunk balancer - Wrote unit tests for binary_partition_utils.py

codecov-commenter · 2021-09-29T22:58:46Z

Codecov Report

Merging #1775 (d89104b) into master (d41b0a4) will increase coverage by 0.79%.
The diff coverage is 92.27%.

@@            Coverage Diff             @@
##           master    #1775      +/-   ##
==========================================
+ Coverage   74.41%   75.20%   +0.79%     
==========================================
  Files          13       16       +3     
  Lines        4581     4796     +215     
==========================================
+ Hits         3409     3607     +198     
- Misses       1172     1189      +17

Impacted Files	Coverage Δ
python/chunk_balancer.py	`89.02% <89.02%> (ø)`
python/timing_measurements.py	`92.85% <92.85%> (ø)`
python/binary_partition_utils.py	`95.18% <95.18%> (ø)`
python/simulation.py	`76.72% <0.00%> (-0.06%)`	⬇️
python/adjoint/objective.py	`91.76% <0.00%> (+0.85%)`	⬆️

doc/docs/Parallel_Meep.md

…ts between runs of a program

doc/docs/Parallel_Meep.md

stevengj · 2021-10-08T01:27:25Z

Failing tests?

bencbartlett · 2021-10-09T00:55:49Z

Sorry about that, made a minor change that broke a test on test_timing_measurements.py. All fixed now.

* Implementation of chunk balancer feature CONTEXT: This feature uses empirical data about the timings of each MPI process to adjust the chunk sizes to achieve optimal load balancing for repeated simulations, such as iterations in an optimization run. Processes which are consistently slow and overworked will have their chunk sizes reduced for the next iteration of an optimization run, and underworked processes will handle larger chunks. This approach has the advantage that it implicitly incorporates variable load between different machines running the MPI processes. SCOPE: - Added abstract class for chunk balancer - Added DefaultChunkBalancer implementation, which adjusts chunk sizes according to per-process working time (ignoring all-all comms) while maintaining previous split directions - Wrote unit tests for DefaultChunkBalancer to check for improvement in load-balancing, convergence to a load-balanced state, and that split_pos values are adjusted correctly for each iteration. - Wrote unit tests for the MockSimulation class used by the DefaultChunkBalancerTests - Added a binary_partition_utils.py library which includes lots of tree traversal algorithms which are useful to have for the chunk balancer - Wrote unit tests for binary_partition_utils.py * fixed wrong test filename in Makefile.am * add new modules to Makefile.am to be included in __init__.py * bugfix in tests * more test bugfixes * avoid merge conflict * refactored class names, added documentation for adjusting chunk layouts between runs of a program * update to Parallel_Meep.md for missing chunk layout file * test bugfix

bencbartlett added 7 commits September 16, 2021 15:31

fixed wrong test filename in Makefile.am

2543719

add new modules to Makefile.am to be included in __init__.py

f816486

bugfix in tests

a99227d

more test bugfixes

a4471a0

avoid merge conflict

168f147

Merge branch 'master' into chunk_balancer2

cd7c1d8

bencbartlett marked this pull request as ready for review September 30, 2021 00:10

stevengj reviewed Sep 30, 2021

View reviewed changes

doc/docs/Parallel_Meep.md Show resolved Hide resolved

stevengj reviewed Sep 30, 2021

View reviewed changes

doc/docs/Parallel_Meep.md Outdated Show resolved Hide resolved

stevengj reviewed Sep 30, 2021

View reviewed changes

doc/docs/Parallel_Meep.md Outdated Show resolved Hide resolved

refactored class names, added documentation for adjusting chunk layou…

e3ed37b

…ts between runs of a program

stevengj reviewed Oct 7, 2021

View reviewed changes

doc/docs/Parallel_Meep.md Outdated Show resolved Hide resolved

update to Parallel_Meep.md for missing chunk layout file

371d1f3

test bugfix

d89104b

stevengj merged commit d12e544 into NanoComp:master Oct 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of chunk balancer feature #1775

Implementation of chunk balancer feature #1775

bencbartlett commented Sep 29, 2021

codecov-commenter commented Sep 29, 2021 •

edited

Loading

stevengj commented Oct 8, 2021

bencbartlett commented Oct 9, 2021

Implementation of chunk balancer feature #1775

Implementation of chunk balancer feature #1775

Conversation

bencbartlett commented Sep 29, 2021

codecov-commenter commented Sep 29, 2021 • edited Loading

Codecov Report

stevengj commented Oct 8, 2021

bencbartlett commented Oct 9, 2021

codecov-commenter commented Sep 29, 2021 •

edited

Loading