Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logic to avoid reallocating ZCOMBUF[RS] at each call #90

Merged

Conversation

samhatfield
Copy link
Collaborator

This is a significant optimisation of the CPU code path. Credit owed to @marsdeno.

TCO1279, 48-node benchmark (--norms --truncation 1279 --niter 100 --nlev 137 --nfld 1 --vordiv --uvders --scders -v):

  • develop:
Inverse-direct transforms
-------------------------
avg  (s):   0.4258
min  (s):   0.3726
max  (s):   1.2771
med  (s):   0.4168
loop (s):  50.9419
  • pre_allocated_buffers:
Inverse-direct transforms
-------------------------
avg  (s):   0.2227
min  (s):   0.1793
max  (s):   1.1176
med  (s):   0.2128
loop (s):  30.9310

Almost 2x speed-up of the median transform time with identical norms.

@samhatfield samhatfield self-assigned this May 3, 2024
@samhatfield samhatfield requested a review from marsdeno May 3, 2024 14:18
@marsdeno
Copy link
Collaborator

marsdeno commented May 7, 2024

Looks good to me

@marsdeno marsdeno merged commit 627a857 into ecmwf-ifs:develop May 7, 2024
11 checks passed
@samhatfield samhatfield deleted the samhatfield/pre_allocated_buffers branch May 7, 2024 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants