Toolset to perform scaling analysis of ICON

It has been tested on Piz Daint (CSCS) to produce the technical part of production projects at CSCS.

On Euler (ETHZ), only limited functionality is provided for the analysis of Icon. See Limitations on Euler for more information.

Below is a description of each script and a recipe.

1. Configure and compile your model as usual.

2. Prepare your running script

Using conda, you can create your environment with:

$ conda env create -f environment.yaml

To load your environment, simply type:

$ conda activate scaling_analysis

Prepare your machine-independent setting file my_exp (e.g. exp.atm_amip, without the '.run').

3. Create and launch different running scripts based on my_exp, but using different numbers of nodes.

Use send_several_run_ncpus_perf.py. For example for running my_exp on 1, 10, 12 and 16 nodes:

$ python [path_to_scaling_analysis_tool]/send_several_run_ncpus_perf.py -e my_exp -n 1 10 12 15

With the command above, 4 running scripts will be created (exp.my_exp_nnodes1.run, exp.my_exp_nnodes10.run, exp.my_exp_nnodes12.run and exp.my_exp_nnodes15.run), and each of them will be launched.

To send several experiments on different node numbers at once, use: send_analyse_different_exp_at_once.py form inside <path_to_scaling_analysis_tool>:

$ python send_analyse_different_exp_at_once.py

The script send_analyse_different_exp_at_once.py (n_step = 1) is a wrapper which calls send_several_run_ncpus_perf.py for different experiments (for example different set-ups, or compilers).

The script send_analyse_different_exp_at_once.py (n_step = 2) is a wrapper which gets the wallclocks from the log files for different experiments (for example different set-ups, or compilers) (point 4 of this README).

4. When all the runs are finished, read all the slurm/log files to get the Wallclock for each run, and put them into a table:

Use the option -m icon:

$ python [path_to_scaling_analysis_tool]/create_scaling_table_per_exp.py -e my_exp -m icon

or for different experiments at once: send_analyse_different_exp_at_once.py (n_step = 2) (cf point 3)

5. Create a summary plot and table of the variable you wish (Efficiency, NH, Wallclock) for different experiments with respect to the number of nodes.

If needed, you can define the line properties of each experiment in def_exps_plot.py.

$ python [path_to_scaling_analysis_tool]/plot_perfs.py

Limitations on Euler

The scaling analysis tools were tested for Icon only.
Because of differing nodes-architectures on Euler, the number of nodes passed via the -n option corresponds to the number of Euler-cores.
In order to have nice plots, the number of Euler-cores needs to be divided by 12.
Automatic runtime-specification is not as smooth as on Daint -> a minimum of 20 min is requested in any case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toolset to perform scaling analysis of ICON

1. Configure and compile your model as usual.

2. Prepare your running script

3. Create and launch different running scripts based on my_exp, but using different numbers of nodes.

4. When all the runs are finished, read all the slurm/log files to get the Wallclock for each run, and put them into a table:

5. Create a summary plot and table of the variable you wish (Efficiency, NH, Wallclock) for different experiments with respect to the number of nodes.

Limitations on Euler

About

Releases 1

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
README.md		README.md
create_scaling_table_per_exp.py		create_scaling_table_per_exp.py
def_exps_plot.py		def_exps_plot.py
environment.yaml		environment.yaml
plot_perfs.py		plot_perfs.py
send_analyse_different_exp_at_once.py		send_analyse_different_exp_at_once.py
send_several_run_ncpus_perf.py		send_several_run_ncpus_perf.py

C2SM/scaling_analysis

Folders and files

Latest commit

History

Repository files navigation

Toolset to perform scaling analysis of ICON

1. Configure and compile your model as usual.

2. Prepare your running script

3. Create and launch different running scripts based on my_exp, but using different numbers of nodes.

4. When all the runs are finished, read all the slurm/log files to get the Wallclock for each run, and put them into a table:

5. Create a summary plot and table of the variable you wish (Efficiency, NH, Wallclock) for different experiments with respect to the number of nodes.

Limitations on Euler

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages