Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MultiGPU Support #450

Merged
merged 50 commits into from
Aug 30, 2023
Merged

Add MultiGPU Support #450

merged 50 commits into from
Aug 30, 2023

Conversation

G-071
Copy link
Member

@G-071 G-071 commented Jun 29, 2023

This PR adds experimental multi-GPU support for all CUDA/KOKKOS kernels in Octo-Tiger.

To do so, it mostly uses the additions in STEllAR-GROUP/hpx/pull/6284 and SC-SGS/CPPuddle/pull/22.
Accordingly, this PR uses the new functionality defined in those PRs. To work correctly, it also

  • ensures that the required constant data is initialized on all GPUs.
  • sets the correct device before all Kokkos calls.
  • and, of course, the PR adapts to all API changes, necessary to initialize the executors on all devices.

Overall, it seems to work well on the machines I tested it on. I will probably re-structure the MultiGPU part of the gravity solver a bit in the future, but I think that is a subject for another PR. This one gives us the basic functionality for now!

@G-071 G-071 requested a review from diehlpk June 29, 2023 02:43
@G-071 G-071 changed the title Adapt for multigpu Add MultiGPU Support Jun 29, 2023
@G-071 G-071 marked this pull request as ready for review August 24, 2023 17:52
@G-071
Copy link
Member Author

G-071 commented Aug 24, 2023

I reworked this PR (and CPPuddle) as I did not quite like the way the previous MultiGPU support was working! However, it is ready for review now!

New Features/Fixes:

  • Add MultiGPU capabilities for gravity and hydro solver (works with CUDA / HIP and Kokkos and NVIDIA and AMD devices).
  • Add workaround for the initialization issues on AMD devices
  • Add option for separate polling threads
  • Add option for additional time-per-timestep analysis
  • Add HPX performance counter for kernel launches and work aggregation statistics
  • Register new CPPuddle performance counters for additional memory statistics

Interface changes:

Renamed existing parameters (to make clearer that they are not CUDA specific anymore):

  • --cuda_number_gpus -> --number_gpus
  • --cuda_streams_per_gpu -> --executors_per_gpu
  • --max_executor_slices -> --max_kernels_fused
  • --cuda_buffer_capacity -> --max_gpu_executor_queue_length

Add new parameters for new features:

  • --polling-threads (Can be used to specify specific threads only working on GPU polling)
  • --print_times_per_timestep (prints additional timestep analysis after octotiger is done)

Copy link
Member

@diehlpk diehlpk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@G-071 G-071 merged commit f1c451a into master Aug 30, 2023
@G-071 G-071 deleted the adapt-for-multigpu branch August 30, 2023 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants