Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Stride Kinds/IDs in Unstructured Nabla #1799

Merged
merged 3 commits into from
Aug 5, 2024

Conversation

fthaler
Copy link
Contributor

@fthaler fthaler commented Jul 24, 2024

Slightly reduces memory usage required for strides / kernel arguments. Gives speedups of almost 9% on Clang/Cray CUDA and around 3.5% on NVCC (for fn_unstructured_nabla_fused_tuple_of_fields, double precision, large domain).

@gridtoolsjenkins
Copy link
Collaborator

Hi there, this is jenkins continuous integration...
Do you want me to verify this patch?

@fthaler
Copy link
Contributor Author

fthaler commented Jul 24, 2024

launch perftest

@fthaler
Copy link
Contributor Author

fthaler commented Jul 24, 2024

launch jenkins

@fthaler
Copy link
Contributor Author

fthaler commented Jul 24, 2024

launch perftest

@fthaler
Copy link
Contributor Author

fthaler commented Aug 5, 2024

launch jenkins

@fthaler
Copy link
Contributor Author

fthaler commented Aug 5, 2024

launch perftest

@havogt
Copy link
Contributor

havogt commented Aug 5, 2024

Can you summarize in the commit message how performance changes.

@fthaler fthaler merged commit ffcf790 into GridTools:master Aug 5, 2024
68 checks passed
@fthaler fthaler deleted the nabla-stride-kinds branch August 5, 2024 14:01
havogt pushed a commit that referenced this pull request Sep 30, 2024
Slightly reduces memory usage required for strides / kernel arguments.
Gives speedups of up to almost 9% on Clang/Cray CUDA and around 3.5% on NVCC
(for fn_unstructured_nabla_fused_tuple_of_fields, double precision,
large domain).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants