Add MultiGPU Support #450

G-071 · 2023-06-29T02:43:16Z

This PR adds experimental multi-GPU support for all CUDA/KOKKOS kernels in Octo-Tiger.

To do so, it mostly uses the additions in STEllAR-GROUP/hpx/pull/6284 and SC-SGS/CPPuddle/pull/22.
Accordingly, this PR uses the new functionality defined in those PRs. To work correctly, it also

ensures that the required constant data is initialized on all GPUs.
sets the correct device before all Kokkos calls.
and, of course, the PR adapts to all API changes, necessary to initialize the executors on all devices.

Overall, it seems to work well on the machines I tested it on. I will probably re-structure the MultiGPU part of the gravity solver a bit in the future, but I think that is a subject for another PR. This one gives us the basic functionality for now!

Based on John's old suggestions in bbcbbca

…elop

Renders the monopole CUDA kernel inoperational, however, KOKKOS_CUDA still works. CUDA will throw an exception when someone tries to use it in this configuration! For the future, it would be handy to have a CMAKE variable to disable CUDA at compiletime but not KOKKOS_CUDA. This workaround should serve for now.

G-071 · 2023-08-24T17:54:50Z

I reworked this PR (and CPPuddle) as I did not quite like the way the previous MultiGPU support was working! However, it is ready for review now!

New Features/Fixes:

Add MultiGPU capabilities for gravity and hydro solver (works with CUDA / HIP and Kokkos and NVIDIA and AMD devices).
Add workaround for the initialization issues on AMD devices
Add option for separate polling threads
Add option for additional time-per-timestep analysis
Add HPX performance counter for kernel launches and work aggregation statistics
Register new CPPuddle performance counters for additional memory statistics

Interface changes:

Renamed existing parameters (to make clearer that they are not CUDA specific anymore):

--cuda_number_gpus -> --number_gpus
--cuda_streams_per_gpu -> --executors_per_gpu
--max_executor_slices -> --max_kernels_fused
--cuda_buffer_capacity -> --max_gpu_executor_queue_length

Add new parameters for new features:

--polling-threads (Can be used to specify specific threads only working on GPU polling)
--print_times_per_timestep (prints additional timestep analysis after octotiger is done)

diehlpk

LGTM!

G-071 added 5 commits June 14, 2023 18:57

Adapt to multi-gpu build (tested on 1 gpu)

10a18b8

Cuda hydro works with multigpu

cb57356

Fix gravity multigpu

e55ff1a

Kokkos multigpu build v1

99a3ad1

Kokkos MultiGPU build v2

db5a55b

G-071 requested a review from diehlpk June 29, 2023 02:43

G-071 changed the title ~~Adapt for multigpu~~ Add MultiGPU Support Jun 29, 2023

G-071 added 23 commits June 29, 2023 12:06

Adapt to mutex type change

5ee5618

Removed hardcoded arguments

b1a71ec

Merge branch 'master' into develop

7cdee51

Add option to print times-per-timestep

b90510a

Allow polling on separate threadpool

3d1f6e4

Based on John's old suggestions in bbcbbca

Reverse core list order

b0da982

Do not diable background work

92b49bd

Fixes

3b880fc

Install tools targets

709f5fc

Add hydro HPX performance counters

b70c303

Add skeleton for gravity performance counters

d62a595

Add gravity hpx performance counters

ebe1096

Merge branch 'develop' of github.com:STEllAR-GROUP/octotiger into dev…

77268b6

…elop

Do not print Kokkos configuration on non-root localities

60e2c9c

Add cppuddle hpx counters

dbeeaa5

Adapt to cppuddle changes

2857b12

WIP, add gpu_id to executor selection

3bc6fad

Fix multigpu for Kokkos and CUDA

d705553

Fix device ID issue

afccbb8

Disable hydro hybrid tests for now

23bbcf0

Fix p2p constant memory issue

879bf2a

Disable remaining cpu_gpu hydro tests

700bce9

G-071 added 13 commits August 21, 2023 00:26

Fix ipr test

cf292cf

Fix multigpu compilation for hip

cdd96a6

Only init constants on specified gpus

a2653fc

Fixes for multi-device amdgpu

4842730

Add num_devices safeguard

0548655

Adapt to cppuddle namespace changes

ae87860

Adapt to cppuddle interface changes v2

c1aecb4

Rename cuda_number_gpus parameter into number_gpus

b46d765

Rename cuda_streams_per_gpu into executors_per_gpu

68701ed

Rename cuda_buffer_capacity into max_gpu_executor_queue_length

34f3c7f

Rename max_executor_slices into max_kernels_fused

b2d4e19

Only use finalize

0364274

Add option check for missing gpu executor

8ad1386

G-071 marked this pull request as ready for review August 24, 2023 17:52

G-071 added 9 commits August 24, 2023 13:51

Fixes for HIP CI

429adb2

Replace reduce with accumulate to support older compilers

b7fb609

Add warnings in case recycling is turned off

8adbad5

Disable 16^3 hip tests due to memory

f017aa1

Re-enable 16^3 hip tests excluding only the memory-intensive ones

e778e14

Reduce hip executors

c06fbef

Disable 16^3 tests again for MI100 CI

9785615

Update gitignore for spack

1bc7a51

Fix FP_FAST test error

d9b4eb3

diehlpk approved these changes Aug 30, 2023

View reviewed changes

G-071 merged commit f1c451a into master Aug 30, 2023

G-071 deleted the adapt-for-multigpu branch August 30, 2023 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MultiGPU Support #450

Add MultiGPU Support #450

G-071 commented Jun 29, 2023

G-071 commented Aug 24, 2023

diehlpk left a comment

Add MultiGPU Support #450

Add MultiGPU Support #450

Conversation

G-071 commented Jun 29, 2023

G-071 commented Aug 24, 2023

diehlpk left a comment

Choose a reason for hiding this comment