Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI Communication #43

Merged
merged 3 commits into from
Dec 21, 2018
Merged

MPI Communication #43

merged 3 commits into from
Dec 21, 2018

Conversation

sslattery
Copy link
Collaborator

@sslattery sslattery commented Nov 8, 2018

Adds support for MPI and basic communication plans and operations:

  1. CommunicationPlan - base class for all communication plans. Contains the base implementation which determines the number of imports and exports as well as neighbor ranks based on export data.

  2. Distributor - provides a communication plan for migrate to move data from one uniquely-owned distribution to another uniquely-owned distribution. This derives from CommunicationPlan.

  3. Halo - provides a communication plan for gather and scatter to manipulate ghosted data. This also derives from CommunicationPlan

Each has a decent set of unit tests that caught some bugs. I have also tested this with a GPU-aware OpenMPI implementation although we are currently only using CUDA UVM (we will be adding regular CUDA memory soon).

@sslattery sslattery added the enhancement New feature or request label Nov 8, 2018
@sslattery sslattery self-assigned this Nov 8, 2018
@codecov-io
Copy link

codecov-io commented Nov 8, 2018

Codecov Report

Merging #43 into master will decrease coverage by 1.3%.
The diff coverage is 96.3%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master     #43     +/-   ##
========================================
- Coverage    99.2%   97.8%   -1.4%     
========================================
  Files          12      15      +3     
  Lines         379     731    +352     
========================================
+ Hits          376     715    +339     
- Misses          3      16     +13
Impacted Files Coverage Δ
core/src/Cabana_Distributor.hpp 93.5% <93.5%> (ø)
core/src/Cabana_Halo.hpp 95.8% <95.8%> (ø)
core/src/Cabana_CommunicationPlan.hpp 99.1% <99.1%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea3c5eb...4579770. Read the comment docs.

@sslattery sslattery changed the title [WIP] MPI Communication MPI Communication Dec 4, 2018
@sslattery
Copy link
Collaborator Author

@dalg24 @junghans @sjplimp @rfbird This one is finally ready for review. I am not opposed to splitting this PR into 3 PRs - one for each of the 3 new classes. Please take a look when you have a chance. Feedback on the APIs would be useful.

@junghans junghans requested a review from rhalver December 5, 2018 00:40
.travis.yml Outdated Show resolved Hide resolved
Copy link
Collaborator

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR

I only looked at CommunicationPlan so far.

Why are CommunicationPlan:: createFromExportsAndTopology(), :: createFromExportsOnly(), and :: createExportSteering() public member functions? The first two feel like they should be constructors, and the last one should be marked as protected if you are only going to call it from derived classes.

Similarly why did you chose Distributor:: createFromExportsAndNeighbors() and :: createFromExports() over two constructor overloads?

elements going to neighbor with local id 1, etc.). Only indices of ghost
elements are in the list of exports.
*/
Kokkos::View<std::size_t*,kokkos_memory_space> exportSteering() const
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Const-correctness: Either return Kokkos::View<std::size_t const*,kokkos_memory_space> or drop the const qualifier on the function member.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this is really confusing to give an accessor (getter with no side effect) such a name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the name as we discussed

core/src/Cabana_CommunicationPlan.hpp Outdated Show resolved Hide resolved
core/src/Cabana_CommunicationPlan.hpp Outdated Show resolved Hide resolved
core/src/Cabana_CommunicationPlan.hpp Outdated Show resolved Hide resolved
core/src/Cabana_CommunicationPlan.hpp Outdated Show resolved Hide resolved
core/src/Cabana_Distributor.hpp Outdated Show resolved Hide resolved
core/src/Cabana_CommunicationPlan.hpp Outdated Show resolved Hide resolved
core/src/Cabana_CommunicationPlan.hpp Outdated Show resolved Hide resolved
core/src/Cabana_CommunicationPlan.hpp Outdated Show resolved Hide resolved
core/src/Cabana_CommunicationPlan.hpp Show resolved Hide resolved
@sslattery
Copy link
Collaborator Author

@dalg24 I refactored the inheritance structure here. The base class now has functions that are essentially protected but they must be made public to use class data with CUDA. I changed the create functions in the derived classes as well. This makes it clear that an entirely new communication plan is being created.

Copy link
Collaborator

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more feedback. I looked at Distributor and Halo.
It would have been nice to generalize/reuse loops that start the nonblocking receives and that perform the blocking sends.

core/src/Cabana_Distributor.hpp Show resolved Hide resolved
core/src/Cabana_Distributor.hpp Outdated Show resolved Hide resolved
core/src/Cabana_Distributor.hpp Outdated Show resolved Hide resolved
core/src/Cabana_Distributor.hpp Outdated Show resolved Hide resolved
core/src/Cabana_Halo.hpp Outdated Show resolved Hide resolved
core/src/Cabana_DeepCopy.hpp Outdated Show resolved Hide resolved
core/src/Cabana_Slice.hpp Outdated Show resolved Hide resolved
core/unit_test/tstCommunicationPlan.hpp Show resolved Hide resolved
@sslattery
Copy link
Collaborator Author

@junghans What do you think is the best way here to enable CommunicationPlan, Distributor, Halo, and their tests/examples only when MPI is enabled?

@sslattery sslattery force-pushed the halo_exchange branch 3 times, most recently from abc868f to 0ccccc0 Compare December 19, 2018 03:26
@sslattery sslattery force-pushed the halo_exchange branch 2 times, most recently from 258fb70 to e0d14d4 Compare December 20, 2018 18:07
Cabana_Types.hpp
Cabana_VerletList.hpp
Cabana_Version.hpp
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say I create a new header and forget to list it here. Will it be caught when building and running the tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may although that is tough to tell because it is header only so it may just pull it in. It will likely show up more when external libraries link in and the missing header is not installed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it won’t, hence globbing and subfolders might be preferred!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets push this as is with regards to the build system them and move this conversation to #55 - once we come to a conclusion there we can implement in a new PR

@sslattery
Copy link
Collaborator Author

sslattery commented Dec 21, 2018

Things to finish:

  • Fix name of exportSteering() -> getExportSteering()

  • Fix name of numExportElement() -> exportSize()

  • Add MPI_Barrier to all communication functions to avoid potential race conditions for same MPI tag calls and the add async versions in the future

@sslattery sslattery merged commit ce733cd into ECP-copa:master Dec 21, 2018
@sslattery sslattery deleted the halo_exchange branch December 21, 2018 18:03
junghans added a commit to sslattery/Cabana that referenced this pull request May 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants