Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move more TPM modules and setup routines to common #148

Merged
merged 18 commits into from
Oct 3, 2024

Conversation

wdeconinck
Copy link
Collaborator

Another small step in direction of using multiple handles of different backend and resolution.

Moving more TPM modules and setup routines to a common precision/backend independent library means that there is no code duplication and/or multiple compilation and a possible reuse of the globals that are part of these modules across the different backends.

I managed to move all but one TPM modules.
The missing one for now is TPM_FLT because that essentially contains the Legendre coefficients for each backend in a specific precision. The CPU version also contains the fast-legendre coefficients and butterfly structures which are not ported to the GPU.
Because of this, it will be still some task to see how we can reuse the Legendre coefficients of same resolution across different precisions and backends without recomputing them.

Most work was spent in making TPM_FIELDS , and the flattened GPU arrays have been moved to a new TPM_FIELDS_FLAT module, and I understand that the code will be refactored soon to remove the use of the flattened arrays.

Note that results will not be bit-identical, but hopefully performance-neutral. Could someone benchmark/verify this for me?

@samhatfield
Copy link
Collaborator

Nice. As before, might take some time to look over this in detail.

Copy link
Collaborator

@samhatfield samhatfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I'd like a bit more assurance that the additional casting won't impact performance. Maybe it's insignificant, but we should demonstrate that with evidence. I'll try running some benchmarks.

@samhatfield samhatfield added the enhancement New feature or request label Oct 1, 2024
@samhatfield
Copy link
Collaborator

Here are some performance results from --norms --niter 100 --nlev 137 --nfld 1 --vordiv --uvders --scders --meminfo:

commit: 9198e86

number_of_nodes truncation precision inv_med dir_med loop_med max_error_combined
1 502 sp 0.297 0.1541 0.4511 8.94e-06
2 633 sp 0.2606 0.1323 0.3929 4.47e-06
4 798 sp 0.2306 0.1255 0.356 3.46e-06
8 1006 sp 0.2303 0.1524 0.3826 6.07e-06

commit: d9ccd88

number_of_nodes truncation precision inv_med dir_med loop_med max_error_combined
1 502 sp 0.2963 0.1549 0.4511 6.66e-06
2 633 sp 0.2789 0.1413 0.4202 7.17e-06
4 798 sp 0.2306 0.1239 0.3544 4.72e-06
8 1006 sp 0.2143 0.1362 0.3509 7e-06

I think it's safe to merge this.

@samhatfield samhatfield self-requested a review October 3, 2024 08:35
Copy link
Collaborator

@marsdeno marsdeno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and nice testing from @samhatfield

@marsdeno marsdeno merged commit 1d516ae into ecmwf-ifs:develop Oct 3, 2024
11 checks passed
@wdeconinck wdeconinck deleted the feature/more_common branch October 4, 2024 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants