support always requesting GPUs on partitions that require it #116

smoors · 2024-02-12T09:41:50Z

fixes #115

changes:

add feature always_request_gpus for partitions that require it, and check for it in hook check_always_request_gpus, called at end of hook assign_tasks_per_compute_unit to ensure it is run after task assignment
small hook updates to allow simplifying the OSU test (done for the pt2pt test, not yet for collective test)
filter scale 2_cores for device_type gpu (in addition to skipping single-node tests with only 1 GPU present in the node), as this scale has only 1 GPU (pt2pt test)
add flake8 configuration to setup.cfg

i did not add the new feature to the config files in the repo as there are currently no partitions that have both feature cpu and gpu, although it does not hurt to add it to all GPU partitions that do not have feature cpu (it should have no effect).

…uest_gpus; update set_num_gpus_per_node in osu test

eessi/testsuite/hooks.py

eessi/testsuite/tests/apps/osu.py

satishskamath · 2024-02-16T16:02:00Z

@smoors can you merge main into this branch?

casparvl

Some small changes. My most fundamental issue is that I'm not convinced the skip_if in set_num_gpus_per_node of osu.py is really needed. And if it is, I think we can at least catch some cases earlier, so that we can avoid generating the tests (rather than generating them, and then skipping).

satishskamath · 2024-02-16T17:04:46Z

Testing results

Without the feature always_request_gpus:

#!/bin/bash
#SBATCH --job-name="rfm_EESSI_OSU_Micro_Benchmarks_pt2pt_73ba759f"
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --output=rfm_job.out
#SBATCH --error=rfm_job.err
#SBATCH --time=0:30:0
#SBATCH -p gpu
#SBATCH --export=None
#SBATCH --mem=12GB
module load 2023
module load OSU-Micro-Benchmarks/7.2-gompi-2023a-CUDA-12.1.1
mpirun -np 2 osu_latency -m 8 -x 10 -i 1000 -c

With the feature always_request_gpus:

#!/bin/bash
#SBATCH --job-name="rfm_EESSI_OSU_Micro_Benchmarks_pt2pt_80d6becf"
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=36
#SBATCH --output=rfm_job.out
#SBATCH --error=rfm_job.err
#SBATCH --time=0:30:0
#SBATCH -p gpu
#SBATCH --export=None
#SBATCH --mem=12GB
#SBATCH --gpus-per-node=4
module load 2023
module load OSU-Micro-Benchmarks/7.2-gompi-2023a-CUDA-12.1.1
mpirun -np 2 osu_latency -m 8 -x 10 -i 1000 -c

The test requests all the gpus in the node. @smoors Not sure if that was intended, since the cpu test can work even with 1 GPU. The GPU is not being used at all.
Apart from that, with this PR, I cannot not list any collective tests using reframe even after I put in filter_supported_scales. Not sure how that got broken.

satishskamath · 2024-02-16T17:28:29Z

@smoors and @casparvl
It seems it is not just this PR. The collective tests do not get listed in the main branch as well. :D I am creating a new PR fixing this.

smoors · 2024-02-16T18:16:38Z

Some small changes. My most fundamental issue is that I'm not convinced the skip_if in set_num_gpus_per_node of osu.py is really needed. And if it is, I think we can at least catch some cases earlier, so that we can avoid generating the tests (rather than generating them, and then skipping).

we can indeed filter out the 2_cores scale early on to avoid skipping.
we do still need to skip for the 1_node scale in case there is only 1 GPU present in the node (which is indeed a rare case in HPC)

smoors · 2024-02-16T18:31:19Z

The test requests all the gpus in the node. @smoors Not sure if that was intended, since the cpu test can work even with 1 GPU. The GPU is not being used at all.

that's indeed intended. i assume you are requesting all the cores in the node (?), and some sites may require requesting all the GPUs if all the cores are requested.

Apart from that, with this PR, I cannot not list any collective tests using reframe even after I put in filter_supported_scales. Not sure how that got broken.

yeah, i did not check the collective tests at all.
i first wanted to make sure we are 100% happy with the pt2pt test, and then it should be relatively easy to make the equivalent changes to collective test.

smoors · 2024-02-17T11:09:29Z

@casparvl i created a filter for scales with < 2 GPUs, see filter_scales_2gpus, but left kept the skip_if just in case someone wants to run this on their laptop.

@satishskamath i suggest that we tackle the collective test in another PR, do you agree?

i also couldn’t resist doing a bit more reorganizing and cleaning up (no changed functionality), i hope you don’t mind :)

smoors · 2024-02-17T11:18:51Z

eessi/testsuite/tests/apps/osu.py

+        commands in a @run_before('setup') hook if not equal to 'cpu'.
+        Therefore, we must set device_buffers *before* the @run_before('setup') hooks.
+        """
+        if self.device_type == DEVICE_TYPES[GPU]:


note: checking for device_type is enough here, as the is_cude_module check is already done in hooks.filter_valid_systems_by_device_type

casparvl · 2024-02-20T11:07:47Z

Ok, so I changed my local config so that our GPU partition now has:

                    'features': [
                        FEATURES[GPU],
                        FEATURES[CPU],
                        FEATURES[ALWAYS_REQUEST_GPUS],
                    ] + valid_scales_snellius_gpu,

The 2_cores and 1_node scales run fine. The 2_nodes scale fails though. I.e.

[     FAIL ] (21/22) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=2_nodes %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023
a %device_type=cpu /01344f7d @snellius:gpu+default
==> test failed during 'sanity': test staged in '/scratch-shared/casparl/reframe_output/staging/snellius/gpu/default/EESSI_OSU_Micro_Benchmarks_pt2pt_01344f7
d'
[     FAIL ] (22/22) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=2_nodes %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %de
vice_type=cpu /83c9ecfe @snellius:gpu+default
==> test failed during 'sanity': test staged in '/scratch-shared/casparl/reframe_output/staging/snellius/gpu/default/EESSI_OSU_Micro_Benchmarks_pt2pt_83c9ecf
e'

The job script is:

#!/bin/bash
#SBATCH --job-name="rfm_EESSI_OSU_Micro_Benchmarks_pt2pt_83c9ecfe"
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=72
#SBATCH --output=rfm_job.out
#SBATCH --error=rfm_job.err
#SBATCH --time=0:30:0
#SBATCH -p gpu
#SBATCH --export=None
#SBATCH --mem=12GB
#SBATCH --gpus-per-node=4
source /cvmfs/software.eessi.io/versions/2023.06/init/bash
module load OSU-Micro-Benchmarks/7.1-1-gompi-2023a
mpirun -np 2 osu_bw -m 4194304 -x 10 -i 1000 -c D D

Note that something strange is happening here: the -c argument is D D, but the test name suggests device_type=cpu. Something seems to have gone wrong here? However, that is NOT what this test fails on:

--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:

  osu_bw

Either request fewer slots for your application, or make more slots
available for use.

A "slot" is the Open MPI term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which Open MPI processes are run:

  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.

That is really strange. There should be two slots, but on 2 different nodes. I'd think this just works, no clue why it doesnt. On CPU, this test variant passes without issues:

[       OK ] (14/22) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_cpn_2_nodes %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-202
3a %device_type=cpu /339d8b1f @snellius:genoa+default
P: bandwidth: 28108.81 MB/s (r:0, l:None, u:None)

I'd like to test this interactively, but right now, I'm unable to get a GPU node, let alone two...

eessi/testsuite/tests/apps/osu.py

satishskamath · 2024-02-20T13:51:59Z

@casparvl and @smoors see https://github.com/EESSI/test-suite/pull/116/files#r1495856364 . That is the reason that the function right below it is not removing the D D option in the executable options which is causing your error.
This most likely occurred after the clean up.

casparvl · 2024-02-20T15:38:36Z

Hm, I can imagine this is why there is D D instead of H H. What I can not imagine is that the mpirun command would downright fail with the complaint of having too few slots...

But, I'll wait for this order to be fixed before trying to debug an issue that might be solved by that...

smoors · 2024-02-20T16:03:04Z

@casparvl and @smoors see https://github.com/EESSI/test-suite/pull/116/files#r1495856364 . That is the reason that the function right below it is not removing the D D option in the executable options which is causing your error. This most likely occurred after the clean up.

you're absolutely right, fixed in c1f3a89

satishskamath · 2024-02-20T23:53:51Z

Hm, I can imagine this is why there is D D instead of H H. What I can not imagine is that the mpirun command would downright fail with the complaint of having too few slots...

But, I'll wait for this order to be fixed before trying to debug an issue that might be solved by that...

Latest results with Sam's fix:
@casparvl and @smoors

#!/bin/bash
#SBATCH --job-name="rfm_EESSI_OSU_Micro_Benchmarks_pt2pt_83c9ecfe"
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=72
#SBATCH --output=rfm_job.out
#SBATCH --error=rfm_job.err
#SBATCH --time=0:30:0
#SBATCH -p gpu
#SBATCH --export=None
#SBATCH --mem=12GB
#SBATCH --gpus-per-node=4
module load 2023
module load OSU-Micro-Benchmarks/7.1-1-gompi-2023a
mpirun -np 2 osu_bw -m 4194304 -x 10 -i 1000 -c

Output:

# OSU MPI Bandwidth Test v7.1
# Size      Bandwidth (MB/s)        Validation
# Datatype: MPI_CHAR.
1                       5.60              Pass
2                      11.24              Pass
4                      22.40              Pass
8                      44.92              Pass
16                     89.70              Pass
32                    179.78              Pass
64                    345.09              Pass
128                   684.48              Pass
256                  1258.13              Pass
512                  2317.53              Pass
1024                 3827.23              Pass
2048                 6659.48              Pass
4096                11999.44              Pass
8192                16906.25              Pass
16384               18146.99              Pass
32768               32736.49              Pass
65536               37084.64              Pass
131072              39673.78              Pass
262144              46274.14              Pass
524288              48739.29              Pass
1048576             48940.37              Pass
2097152             49176.59              Pass
4194304             49298.15              Pass

JOB STATISTICS
==============
Job ID: 5303230
Cluster: snellius
User/Group: satishk/satishk
State: COMPLETED (exit code 0)
Nodes: 2
Cores per node: 72
CPU Utilized: 00:02:06
CPU Efficiency: 0.64% of 05:28:48 core-walltime
Job Wall-clock time: 00:02:17
Memory Utilized: 275.01 MB
Memory Efficiency: 1.12% of 24.00 GB

So the error is indeed gone with disappearance of D D.

smoors · 2024-02-24T14:37:46Z

@casparvl the following job worked for me (using srun), so your failed job seems specific to mpirun or your cluster

#!/bin/bash
#SBATCH --job-name="rfm_EESSI_OSU_Micro_Benchmarks_pt2pt_cd750be1"
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --output=rfm_job.out
#SBATCH --error=rfm_job.err
#SBATCH --time=0:30:0
#SBATCH --partition=ampere_gpu
#SBATCH --mem=12GB
#SBATCH --gpus-per-node=2
module load OSU-Micro-Benchmarks/7.2-gompi-2023a-CUDA-12.1.1
srun --cpus-per-task=32 osu_latency -m 8 -x 10 -i 1000 -c -d cuda D D

casparvl · 2024-02-26T08:48:07Z

Sorry for the delay here. I really want to try and have another look today. I tried last week, and think I got things to pass then as well, but didn't have time to really assess properly...

satishskamath · 2024-02-26T17:59:40Z

I checked scales 1_node, 1_cpn_2_nodes, 2_cores and 2_nodes for OSU and it seems to be working well for all with the latest commit.

satishskamath

Apart from that comment which can also be changed later, I approve this PR. Waiting for @casparvl .

eessi/testsuite/tests/apps/osu.py

casparvl

What about my request to prepend check_always_request_gpus with an underscore? :) I got a thumps up on that, but I don't think it was changed, right?

casparvl · 2024-02-29T12:31:16Z

Btw, I tested, and everything runs fine now. I also see the correct number of GPUs requested, i.e. for 2_cores I get:

$ cat /scratch-shared/casparl/reframe_output/staging/snellius/gpu/default/EESSI_OSU_Micro_Benchmarks_pt2pt_9a736b93/rfm_job.sh
#!/bin/bash
#SBATCH --job-name="rfm_EESSI_OSU_Micro_Benchmarks_pt2pt_9a736b93"
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --output=rfm_job.out
#SBATCH --error=rfm_job.err
#SBATCH --time=0:30:0
#SBATCH -p gpu
#SBATCH --export=None
#SBATCH --mem=12GB
#SBATCH --gpus-per-node=1
source /cvmfs/software.eessi.io/versions/2023.06/init/bash
module load OSU-Micro-Benchmarks/7.1-1-gompi-2023a
mpirun -np 2 osu_bw -m 4194304 -x 10 -i 1000 -c

While for 2_nodes I get:

$ cat /scratch-shared/casparl/reframe_output/staging/snellius/gpu/default/EESSI_OSU_Micro_Benchmarks_pt2pt_83c9ecfe/rfm_job.sh
#!/bin/bash
#SBATCH --job-name="rfm_EESSI_OSU_Micro_Benchmarks_pt2pt_83c9ecfe"
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=72
#SBATCH --output=rfm_job.out
#SBATCH --error=rfm_job.err
#SBATCH --time=0:30:0
#SBATCH -p gpu
#SBATCH --export=None
#SBATCH --mem=12GB
#SBATCH --gpus-per-node=4
source /cvmfs/software.eessi.io/versions/2023.06/init/bash
module load OSU-Micro-Benchmarks/7.1-1-gompi-2023a
mpirun -np 2 osu_bw -m 4194304 -x 10 -i 1000 -c

That is as intended, and conforms to the node_part for those scales.

smoors · 2024-02-29T12:57:11Z

What about my request to prepend check_always_request_gpus with an underscore? :) I got a thumps up on that, but I don't think it was changed, right?

forgot to actually do it, fixed in 82891ba

casparvl

Lgtm!

support always requesting GPUs on partitions that require it

38167db

satishskamath self-requested a review February 12, 2024 09:53

smoors marked this pull request as draft February 13, 2024 12:38

Samuel Moors added 6 commits February 13, 2024 20:34

add flake8 config

3e62032

split off assign_default_num_gpus_per_node; fix hook check_always_req…

c672e59

…uest_gpus; update set_num_gpus_per_node in osu test

Merge branch 'main' into request_gpus

5de015f

minor update

2b9813b

remove old line

8607516

set num_gpus_per_node only if not yet set

d68fdf2

smoors marked this pull request as ready for review February 14, 2024 08:50

update comment

8961463

casparvl reviewed Feb 16, 2024

View reviewed changes

eessi/testsuite/hooks.py Outdated Show resolved Hide resolved

casparvl reviewed Feb 16, 2024

View reviewed changes

eessi/testsuite/hooks.py Outdated Show resolved Hide resolved

casparvl reviewed Feb 16, 2024

View reviewed changes

eessi/testsuite/tests/apps/osu.py Outdated Show resolved Hide resolved

casparvl requested changes Feb 16, 2024

View reviewed changes

Merge branch 'main' into request_gpus

949e511

Samuel Moors added 2 commits February 16, 2024 19:38

prefix non-public functions

fa9d530

filter scales with < 2 gpus + reorganize + cleanup

7446ae8

smoors commented Feb 17, 2024

View reviewed changes

satishskamath reviewed Feb 20, 2024

View reviewed changes

eessi/testsuite/tests/apps/osu.py Show resolved Hide resolved

fix adjust_executable_opts

c1f3a89

satishskamath approved these changes Feb 26, 2024

View reviewed changes

eessi/testsuite/tests/apps/osu.py Outdated Show resolved Hide resolved

fix comment

a13e910

casparvl requested changes Feb 29, 2024

View reviewed changes

prepend check_always_request_gpus with underscore

82891ba

casparvl approved these changes Feb 29, 2024

View reviewed changes

casparvl merged commit ba35eb2 into EESSI:main Feb 29, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support always requesting GPUs on partitions that require it #116

support always requesting GPUs on partitions that require it #116

smoors commented Feb 12, 2024 •

edited

Loading

satishskamath commented Feb 16, 2024

casparvl left a comment

satishskamath commented Feb 16, 2024 •

edited

Loading

satishskamath commented Feb 16, 2024

smoors commented Feb 16, 2024

smoors commented Feb 16, 2024

smoors commented Feb 17, 2024 •

edited

Loading

smoors Feb 17, 2024

casparvl commented Feb 20, 2024

satishskamath commented Feb 20, 2024 •

edited

Loading

casparvl commented Feb 20, 2024

smoors commented Feb 20, 2024

satishskamath commented Feb 20, 2024 •

edited

Loading

smoors commented Feb 24, 2024

casparvl commented Feb 26, 2024

satishskamath commented Feb 26, 2024

satishskamath left a comment •

edited

Loading

casparvl left a comment •

edited

Loading

casparvl commented Feb 29, 2024

smoors commented Feb 29, 2024

casparvl left a comment

support always requesting GPUs on partitions that require it #116

support always requesting GPUs on partitions that require it #116

Conversation

smoors commented Feb 12, 2024 • edited Loading

satishskamath commented Feb 16, 2024

casparvl left a comment

Choose a reason for hiding this comment

satishskamath commented Feb 16, 2024 • edited Loading

Testing results

satishskamath commented Feb 16, 2024

smoors commented Feb 16, 2024

smoors commented Feb 16, 2024

smoors commented Feb 17, 2024 • edited Loading

smoors Feb 17, 2024

Choose a reason for hiding this comment

casparvl commented Feb 20, 2024

satishskamath commented Feb 20, 2024 • edited Loading

casparvl commented Feb 20, 2024

smoors commented Feb 20, 2024

satishskamath commented Feb 20, 2024 • edited Loading

smoors commented Feb 24, 2024

casparvl commented Feb 26, 2024

satishskamath commented Feb 26, 2024

satishskamath left a comment • edited Loading

Choose a reason for hiding this comment

casparvl left a comment • edited Loading

Choose a reason for hiding this comment

casparvl commented Feb 29, 2024

smoors commented Feb 29, 2024

casparvl left a comment

Choose a reason for hiding this comment

smoors commented Feb 12, 2024 •

edited

Loading

satishskamath commented Feb 16, 2024 •

edited

Loading

smoors commented Feb 17, 2024 •

edited

Loading

satishskamath commented Feb 20, 2024 •

edited

Loading

satishskamath commented Feb 20, 2024 •

edited

Loading

satishskamath left a comment •

edited

Loading

casparvl left a comment •

edited

Loading