Improve neighbor loads in `horizontal_shift` of `gtfn` #1779

iomaganaris · 2024-06-04T07:42:51Z

Define separately neighbor variable to avoid extra memory loads in horizontal_shift()

gridtoolsjenkins · 2024-06-04T07:45:03Z

Hi there, this is jenkins continuous integration...
Do you want me to verify this patch?

iomaganaris · 2024-06-04T07:49:21Z

launch jenkins

iomaganaris · 2024-06-04T08:59:34Z

launch perftests

…rizontal_shift()

fthaler · 2024-06-10T12:26:42Z

include/gridtools/fn/unstructured.hpp

@@ -84,7 +84,8 @@ namespace gridtools::fn {
        template <class Tag, class Ptr, class Strides, class Domain, class Conn, class Offset>
        GT_FUNCTION constexpr auto horizontal_shift(iterator<Tag, Ptr, Strides, Domain> const &it, Conn, Offset) {
            auto const &table = host_device::at_key<Conn>(it.m_domain.m_tables);
-            auto new_index = it.m_index == -1 ? -1 : get<Offset::value>(neighbor_table::neighbors(table, it.m_index));
+            const auto neighbor = get<Offset::value>(neighbor_table::neighbors(table, it.m_index));


This gives an out-of-bounds access if it.m_index is indeed -1, doesn’t it? (Which is a functional change to the previous code, so it makes sense that the compiler can apply different optimizations, e.g., load the neighbor table independently of the value of it.m_index).

Yes that's true so it makes sense. Is there any case that the m_index check can be avoided? I'm wondering what is special with the nabla kernels that they were the only ones that benefited from this change. Also if the total runtime of the nabla kernels in the production case is small it's probably not worth to try and improve this part.
@havogt since this change alters the behavior and is possibly problematic should there be a test in the CI for a breaking case?

Right, I guess we should go for the compile time has_skip_value check to achieve this and possibly even better performance?

Yes, I can try it out and launch the perftests CI to check

fthaler · 2024-06-10T12:32:28Z

Another thing I noticed: the data layout of the ‘default’ neighbor tables (that is, arrays of tuples) is actually pretty bad: it does not allow for coalesced loads in a reasonable way and does not guarantee any alignment.

iomaganaris · 2024-06-11T07:14:59Z

launch perftests

iomaganaris · 2024-06-11T07:16:38Z

launch jenkins

@lukasm91

Credits to @lukasm91 for hinting to `__builtin_assume` in `deref`. On more recent compilers (than 11.2 what we have in CI on daint), will improve codegen to the level of #1779, but is save. --------- Co-authored-by: Felix Thaler <[email protected]>

iomaganaris · 2024-06-18T12:59:15Z

Superseded by #1785 since this PR is problematic

@lukasm91

Credits to @lukasm91 for hinting to `__builtin_assume` in `deref`. On more recent compilers (than 11.2 what we have in CI on daint), will improve codegen to the level of #1779, but is save. --------- Co-authored-by: Felix Thaler <[email protected]>

iomaganaris requested a review from havogt June 4, 2024 07:42

iomaganaris changed the base branch from try_remove_cuda_arch to master June 4, 2024 08:00

havogt and others added 4 commits June 6, 2024 10:21

Bug: storage/gpu.h functions within __CUDA_ARCH__

0a486d3

within GT_CUDACC

1e6b074

Define separately neighbor variable to avoid extra memory loads in ho…

3faab94

…rizontal_shift()

Add my name on AUTHORS

d302b0b

iomaganaris force-pushed the neighbor_read_perf_improvement branch from ca10767 to d302b0b Compare June 6, 2024 08:22

fthaler reviewed Jun 10, 2024

View reviewed changes

havogt mentioned this pull request Jun 18, 2024

Introduce GT_PROMISE for __builtin_assume #1785

Merged

iomaganaris closed this Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve neighbor loads in `horizontal_shift` of `gtfn` #1779

Improve neighbor loads in `horizontal_shift` of `gtfn` #1779

iomaganaris commented Jun 4, 2024

gridtoolsjenkins commented Jun 4, 2024

iomaganaris commented Jun 4, 2024

iomaganaris commented Jun 4, 2024

fthaler Jun 10, 2024

iomaganaris Jun 10, 2024

havogt Jun 10, 2024

iomaganaris Jun 10, 2024

fthaler commented Jun 10, 2024

iomaganaris commented Jun 11, 2024

iomaganaris commented Jun 11, 2024

iomaganaris commented Jun 18, 2024

Improve neighbor loads in horizontal_shift of gtfn #1779

Improve neighbor loads in horizontal_shift of gtfn #1779

Conversation

iomaganaris commented Jun 4, 2024

gridtoolsjenkins commented Jun 4, 2024

iomaganaris commented Jun 4, 2024

iomaganaris commented Jun 4, 2024

fthaler Jun 10, 2024

Choose a reason for hiding this comment

iomaganaris Jun 10, 2024

Choose a reason for hiding this comment

havogt Jun 10, 2024

Choose a reason for hiding this comment

iomaganaris Jun 10, 2024

Choose a reason for hiding this comment

fthaler commented Jun 10, 2024

iomaganaris commented Jun 11, 2024

iomaganaris commented Jun 11, 2024

iomaganaris commented Jun 18, 2024

Improve neighbor loads in `horizontal_shift` of `gtfn` #1779

Improve neighbor loads in `horizontal_shift` of `gtfn` #1779