Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental HIP support #389

Merged
merged 83 commits into from
Jan 25, 2022
Merged

Add experimental HIP support #389

merged 83 commits into from
Jan 25, 2022

Conversation

G-071
Copy link
Member

@G-071 G-071 commented Nov 26, 2021

This PR adds the first experimental support for running Octo-Tiger on AMD GPUs.

While the effort regarding a (performance-portable) AMD GPU port is still ongoing, I think this is a good point to get some of the changes into master (before the PR gets too unwieldy).

Changelog:

  • Added HIP kernel variants for most 1 major CUDA kernels (basically by hipifying the API)
  • Reworked CUDA/HIP Kernels for increased performance (separated Stencils into more blocks)
  • Added support for running most 1 Kokkos kernels on the HIP backend via the hpx-kokkos hip executors
  • Added CLI options to actually use the new kernels (device kernel types: HIP, KOKKOS_HIP)
  • Added tests for all new kernels
  • Added Jenkins Pipeline for AMD GPU tests on the Rostam MI100 nodes
  • Added Jenkins Pipeline for the Stuttgart Radeon VII Pro node
  • Added warnings about HIP support still being experimental...

Note:

  • Still does not quite achieve the same performance as we get on NVIDIA devices -> needs more work!
  • CLI parameter names need to be renamed ("cuda_streams_per_gpu" is an odd name for the number of Kokkos / HIP executors..) - as this will change the interface for all existing scripts, I would postpone this task for a later PR

Build instructions:

  • Set the cmake variable OCTOTIGER_WITH_CUDA=OFF and OCTOTIGER_WITH_HIP=ON
  • Experimental toolchain branch available here

Remaining ToDos

  • Cleanup


[1]: Except the p2m kernel as yet, as it requires more effort to port but only constitutes a small percentage of the runtime

@G-071 G-071 marked this pull request as ready for review January 22, 2022 18:51
@G-071 G-071 requested review from diehlpk and dmarce1 January 22, 2022 18:52
octotiger/profiler.hpp Outdated Show resolved Hide resolved
Copy link
Member

@diehlpk diehlpk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added two comments which should be addressed.

Copy link
Member

@diehlpk diehlpk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@G-071 G-071 merged commit 22b6b03 into master Jan 25, 2022
@G-071 G-071 deleted the cuda_to_hip branch January 25, 2022 14:57
@G-071 G-071 restored the cuda_to_hip branch January 25, 2022 14:58
@diehlpk diehlpk deleted the cuda_to_hip branch January 6, 2023 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants