Skip to content

Commit

Permalink
Specify different-dtype alias TV alignment (#3084)
Browse files Browse the repository at this point in the history
See #2934 (comment)

PR #665 allowed us to re-use allocations that have different dtypes. We
already check that our aliased tensors do not have vectorized accesses
larger than those of the original tensors. However, when we have
different dtypes we `reinterpret_cast` it to a different `Array` type.
Previously we did not specify any alignment in that type's template
args, meaning it assumed an alignment of size 1. Since the actual
addresses will all still be aligned this does not caused misaligned
accesses at runtime. This PR sets the template arg for alignment to be
that of the vectorized access width for the alias tensor, so that the
compiler could hypothetically do some optimizations knowing the address
is aligned.
  • Loading branch information
jacobhinkle authored Oct 2, 2024
1 parent 37f6d54 commit 2db40b0
Showing 1 changed file with 5 additions and 2 deletions.
7 changes: 5 additions & 2 deletions csrc/codegen.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2964,10 +2964,13 @@ class CudaKernelGenerator : private kir::ConstIrVisitor {
} else {
indent() << "// Alias Allocation (changing dtype) - "
<< alloc->memoryType() << "\n";
auto va = kernel_->summary().vectorized_accesses;
auto it = va.find(tv);
int64_t alias_alignment = it == va.end() ? 1 : it->second;
indent() << "auto " << genVariableName(tv)
<< " = *reinterpret_cast<Array<" << buffer_dtype << ", "
<< genInline(size) << ">*>(&" << genVariableName(alias_tv)
<< ");\n";
<< genInline(size) << ", " << alias_alignment << ">*>(&"
<< genVariableName(alias_tv) << ");\n";
if (alloc->memoryType() == MemoryType::Local) {
aligned_array_of_regs_.insert(tv);
}
Expand Down

0 comments on commit 2db40b0

Please sign in to comment.