Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hydro optimizations: Explicit work aggregation, Comm optimizations, explicit SIMD instructions #426

Merged
merged 144 commits into from
Jun 24, 2022

Conversation

G-071
Copy link
Member

@G-071 G-071 commented Jun 22, 2022

This PR contains 3 major changes to the hydro module:

  1. Added explicit work aggregation for the two major hydro compute kernel (reconstruct, flux) using the new executors added in Work aggregation experimental SC-SGS/CPPuddle#12.
    The maximum number of aggregated kernel launches can be steered, CLI parameter --max_executor_slices=<unsigned int>.
    This feature is yet to be considered experimental, though it passed all tests and increases performance substantially. This PR also adds some new tests to continually test different aggregation sizes. The work aggregation works for all hydro GPU kernel implementations (CUDA, HIP, Kokkos).

  2. Improved communication in the hydro solver! Notably, exchanging the hydro_boundaries (and subsequently the hydro_amr_boundaries) now takes the HPX localities into account. If the neighbor in question is on the same locality, no HPX action will be used for communication! Further, no communication buffer will be used. Instead, the memory of the neighbor is accessed directly (with some new local hpx promises/futures to avoid any races). The old communication implementation can still be used with the parameter --optimize_local_communication=0.

  3. Added first hydro SIMD implementation for flux and reconstruct kernels. The kernels still work on the GPU (using the Kokkos SIMD double specialization), but support a wide range of CPU SIMD instructions sets as we can both use std::experimental::simd and the Kokkos SIMD types on CPU (tested mostly with AVX512 so far). This also makes the previous experimental Vc flux kernel obsolete. Hence, I removed it. Furthermore, while adding this SIMD implementation I also cleaned up the experimental scalar kernels for flux and reconstruct which were leftovers from the initial hydro GPU port (removing a lot of code duplication in the process.)

While I am still actively working on the implementation of 1. and 3., right now is a good point to merge this back into master before the PR gets too large/unwieldy as everything is working and the tests pass. I will add subsequent optimizations / code cleanups in separate PRs.

Some additional features in this PR:

  • Added Jenkins Pipelines for reproducible performance tests. To this end it includes a new sub-repository ( https://github.com/G-071/octotiger-performance-tests ) containing said performance tests as well as scripts to plot the data and the Jenkins Pipelines that execute the performance tests for any branches with string "performance" as part of the branch name. Resulting tables and plots are sent by emails after the tests finish.
  • Added explicit work aggregation for the experimental AMR-completion CUDA methods.

I add this as a draft PR, as some CI pipelines still show problems. The problems seem to be caused by the changed module environment on rostam (g++ 10 and clang 12 modules crash during loading) and obsolete g++/HPX versions on CircleCI. However, all tests pass locally, so it is likely we can fix this on the server-side independent of the PR (fix the modules on Rostam, upgrade CircleCI dependencies).

@G-071 G-071 requested review from diehlpk and dmarce1 June 22, 2022 14:08
Copy link
Member

@diehlpk diehlpk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check all newly added files for the license header.

G-071 added 5 commits June 23, 2022 18:59
As the rocm 4.5 installation seems to be broken
As the libatomic gcc bug on Rostam seems to be resolved
As we disabled the legacy circleci tests for now
@G-071
Copy link
Member Author

G-071 commented Jun 24, 2022

The license information were missing in quite a few files actually! I have added them accordingly in 4236a52

@G-071
Copy link
Member Author

G-071 commented Jun 24, 2022

I adapted the Jenkins pipelines to the changes on Rostam and removed the legacy tests on CircleCI. I left the script for CircleCI in case we actually want to update these tests and re-enable them, but in their current form they're redundant anyway since the Jenkins tests also cover the same things (and far more).

Once the remaining pipelines are completed, this PR should be good to go!

@diehlpk Could you have another look at it?

I recommend that this PR should be merged before #425, as we need the Jenkins pipeline fixes there as well.

@G-071 G-071 marked this pull request as ready for review June 24, 2022 14:50
Copy link
Member

@diehlpk diehlpk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@diehlpk diehlpk merged commit 6848ea1 into master Jun 24, 2022
@diehlpk diehlpk deleted the hydro_optimizations branch June 24, 2022 23:50
This was referenced Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants