Specify different-dtype alias TV alignment (#3084)

See #2934 (comment) PR #665 allowed us to re-use allocations that have different dtypes. We already check that our aliased tensors do not have vectorized accesses larger than those of the original tensors. However, when we have different dtypes we `reinterpret_cast` it to a different `Array` type. Previously we did not specify any alignment in that type's template args, meaning it assumed an alignment of size 1. Since the actual addresses will all still be aligned this does not caused misaligned accesses at runtime. This PR sets the template arg for alignment to be that of the vectorized access width for the alias tensor, so that the compiler could hypothetically do some optimizations knowing the address is aligned.
NVIDIA · Oct 2, 2024 · 2db40b0 · 2db40b0
1 parent 37f6d54
commit 2db40b0
Showing 1 changed file with 5 additions and 2 deletions.
diff --git a/csrc/codegen.cpp b/csrc/codegen.cpp
@@ -2964,10 +2964,13 @@ class CudaKernelGenerator : private kir::ConstIrVisitor {
       } else {
         indent() << "// Alias Allocation (changing dtype) - "
                  << alloc->memoryType() << "\n";
+        auto va = kernel_->summary().vectorized_accesses;
+        auto it = va.find(tv);
+        int64_t alias_alignment = it == va.end() ? 1 : it->second;
         indent() << "auto " << genVariableName(tv)
                  << " = *reinterpret_cast<Array<" << buffer_dtype << ", "
-                 << genInline(size) << ">*>(&" << genVariableName(alias_tv)
-                 << ");\n";
+                 << genInline(size) << ", " << alias_alignment << ">*>(&"
+                 << genVariableName(alias_tv) << ");\n";
         if (alloc->memoryType() == MemoryType::Local) {
           aligned_array_of_regs_.insert(tv);
         }