Move more TPM modules and setup routines to common #148

wdeconinck · 2024-09-12T14:50:31Z

Another small step in direction of using multiple handles of different backend and resolution.

Moving more TPM modules and setup routines to a common precision/backend independent library means that there is no code duplication and/or multiple compilation and a possible reuse of the globals that are part of these modules across the different backends.

I managed to move all but one TPM modules.
The missing one for now is TPM_FLT because that essentially contains the Legendre coefficients for each backend in a specific precision. The CPU version also contains the fast-legendre coefficients and butterfly structures which are not ported to the GPU.
Because of this, it will be still some task to see how we can reuse the Legendre coefficients of same resolution across different precisions and backends without recomputing them.

Most work was spent in making TPM_FIELDS , and the flattened GPU arrays have been moved to a new TPM_FIELDS_FLAT module, and I understand that the code will be refactored soon to remove the use of the flattened arrays.

Note that results will not be bit-identical, but hopefully performance-neutral. Could someone benchmark/verify this for me?

samhatfield · 2024-09-12T14:56:41Z

Nice. As before, might take some time to look over this in detail.

…LAPIN

This makes TPM_FIELDS now all double precision

src/trans/cpu/internal/ledir_mod.F90

samhatfield

Looks good, but I'd like a bit more assurance that the additional casting won't impact performance. Maybe it's insignificant, but we should demonstrate that with evidence. I'll try running some benchmarks.

samhatfield · 2024-10-03T08:34:56Z

Here are some performance results from --norms --niter 100 --nlev 137 --nfld 1 --vordiv --uvders --scders --meminfo:

commit: 9198e86

number_of_nodes	truncation	precision	inv_med	dir_med	loop_med	max_error_combined
1	502	sp	0.297	0.1541	0.4511	8.94e-06
2	633	sp	0.2606	0.1323	0.3929	4.47e-06
4	798	sp	0.2306	0.1255	0.356	3.46e-06
8	1006	sp	0.2303	0.1524	0.3826	6.07e-06

commit: d9ccd88

number_of_nodes	truncation	precision	inv_med	dir_med	loop_med	max_error_combined
1	502	sp	0.2963	0.1549	0.4511	6.66e-06
2	633	sp	0.2789	0.1413	0.4202	7.17e-06
4	798	sp	0.2306	0.1239	0.3544	4.72e-06
8	1006	sp	0.2143	0.1362	0.3509	7e-06

I think it's safe to merge this.

marsdeno

Looks good, and nice testing from @samhatfield

wdeconinck requested review from samhatfield and marsdeno September 12, 2024 14:50

wdeconinck added 18 commits September 13, 2024 06:52

Remove unused tpm_fft.F90 from gpu code path

8aa61a4

Remove unused variable FLT_TYPE::MAXCOLS

c5c386b

Remove unused flattened F_RN for GPU

51a3726

Make F%RW (gaussian weights) double precision

71a00c1

Make F%R1MU2 (cos(theta)^2) double precision

5bfd6f1

Make F%RACTHE (1/cos(theta)) double precision

9c0a942

Make F%RMU2 (sin(theta)) double precision

9920358

Make F%REPSNM double precision, but not yet flattened GPU version ZEPSNM

591c94e

Make F%RN double precision

89ad168

Make F%RLAPIN double precision, but not yet flattened GPU version F_R…

b79a6f3

…LAPIN

Move TPM_FIELDS flattened arrays into separate module TPM_FIELS_FLAT

7085d45

This makes TPM_FIELDS now all double precision

Move tpm_fields.F90 to common

50abc74

Move pre_suleg_mod.F90 to common

fcd7d38

Make wts500_mod.F90 double precision

9efdc9d

Move wts500_mod.F90 to common

8971be5

Move interpol_decomp_mod.F90 to common

4a50888

Move setup_geom_mod.F90 to common

5a2ff3d

Move setup_dims_mod.F90 to common

d9ccd88

wdeconinck force-pushed the feature/more_common branch from dc37408 to d9ccd88 Compare September 13, 2024 06:52

samhatfield reviewed Sep 13, 2024

View reviewed changes

src/trans/cpu/internal/ledir_mod.F90 Show resolved Hide resolved

samhatfield reviewed Sep 13, 2024

View reviewed changes

samhatfield added the enhancement New feature or request label Oct 1, 2024

samhatfield self-requested a review October 3, 2024 08:35

samhatfield approved these changes Oct 3, 2024

View reviewed changes

marsdeno approved these changes Oct 3, 2024

View reviewed changes

marsdeno merged commit 1d516ae into ecmwf-ifs:develop Oct 3, 2024
11 checks passed

wdeconinck deleted the feature/more_common branch October 4, 2024 10:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move more TPM modules and setup routines to common #148

Move more TPM modules and setup routines to common #148

wdeconinck commented Sep 12, 2024

samhatfield commented Sep 12, 2024

samhatfield left a comment

samhatfield commented Oct 3, 2024

marsdeno left a comment

Move more TPM modules and setup routines to common #148

Move more TPM modules and setup routines to common #148

Conversation

wdeconinck commented Sep 12, 2024

samhatfield commented Sep 12, 2024

samhatfield left a comment

Choose a reason for hiding this comment

samhatfield commented Oct 3, 2024

marsdeno left a comment

Choose a reason for hiding this comment