Specify different-dtype alias TV alignment #3084

jacobhinkle · 2024-10-02T12:15:55Z

PR #665 allowed us to re-use allocations that have different dtypes. We already check that our aliased tensors do not have vectorized accesses larger than those of the original tensors. However, when we have different dtypes we reinterpret_cast it to a different Array type. Previously we did not specify any alignment in that type's template args, meaning it assumed an alignment of size 1. Since the actual addresses will all still be aligned this does not caused misaligned accesses at runtime. This PR sets the template arg for alignment to be that of the vectorized access width for the alias tensor, so that the compiler could hypothetically do some optimizations knowing the address is aligned.

jacobhinkle · 2024-10-02T12:16:02Z

!build

jacobhinkle · 2024-10-02T12:17:19Z

csrc/codegen.cpp

@@ -2964,10 +2964,13 @@ class CudaKernelGenerator : private kir::ConstIrVisitor {
      } else {
        indent() << "// Alias Allocation (changing dtype) - "
                 << alloc->memoryType() << "\n";
+        auto va = kernel_->summary().vectorized_accesses;
+        auto it = va.find(tv);


We could instead use the original TV, whose vectorized accesses are guaranteed to be at least as large as this 🤷‍♂️

That may not work since the actual alignment size is sizeof(buffer_dtype) * alias_alignment. If we used the alignment of the original tensor, the alignment size would become larger if the size of the data type of the original tensor was smaller.

This is currently only used by "inner aliasing" in which the size of the dtypes must match, as well as the parallelization and shapes. I'll leave it though for the reason you stated since in the future we might possibly allow mismatched shapes and dtype sizes subject to some other constraint.

OK, I see. I think what's more important here would be we should make CudaCodeGen as trivial as possible and try to minimize the coupling with the lowering passes. It's difficult to keep track of all the assumptions made in the prior phases of code translations. For this case particular, since the lowering passes the alignment size, that should be just used as is. If some other valid value should be used instead, that decision should be made by the lowering and be recorded in the kernel summary.

naoyam

LGTM

Specify different-dtype alias TV alignment

2747d7a

jacobhinkle marked this pull request as ready for review October 2, 2024 12:16

jacobhinkle mentioned this pull request Oct 2, 2024

use aligned array for iter grouped reduction inputs #2934

Merged

jacobhinkle requested a review from naoyam October 2, 2024 12:16

jacobhinkle commented Oct 2, 2024

View reviewed changes

naoyam approved these changes Oct 2, 2024

View reviewed changes

jacobhinkle merged commit 2db40b0 into main Oct 2, 2024
35 of 36 checks passed

jacobhinkle deleted the alias_alignment branch October 2, 2024 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify different-dtype alias TV alignment #3084

Specify different-dtype alias TV alignment #3084

jacobhinkle commented Oct 2, 2024

jacobhinkle commented Oct 2, 2024

jacobhinkle Oct 2, 2024

naoyam Oct 2, 2024

jacobhinkle Oct 2, 2024

naoyam Oct 2, 2024

naoyam left a comment

Specify different-dtype alias TV alignment #3084

Specify different-dtype alias TV alignment #3084

Conversation

jacobhinkle commented Oct 2, 2024

jacobhinkle commented Oct 2, 2024

jacobhinkle Oct 2, 2024

Choose a reason for hiding this comment

naoyam Oct 2, 2024

Choose a reason for hiding this comment

jacobhinkle Oct 2, 2024

Choose a reason for hiding this comment

naoyam Oct 2, 2024

Choose a reason for hiding this comment

naoyam left a comment

Choose a reason for hiding this comment