Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify different-dtype alias TV alignment #3084

Merged
merged 1 commit into from
Oct 2, 2024
Merged

Conversation

jacobhinkle
Copy link
Collaborator

See #2934 (comment)

PR #665 allowed us to re-use allocations that have different dtypes. We already check that our aliased tensors do not have vectorized accesses larger than those of the original tensors. However, when we have different dtypes we reinterpret_cast it to a different Array type. Previously we did not specify any alignment in that type's template args, meaning it assumed an alignment of size 1. Since the actual addresses will all still be aligned this does not caused misaligned accesses at runtime. This PR sets the template arg for alignment to be that of the vectorized access width for the alias tensor, so that the compiler could hypothetically do some optimizations knowing the address is aligned.

@jacobhinkle
Copy link
Collaborator Author

!build

@jacobhinkle jacobhinkle marked this pull request as ready for review October 2, 2024 12:16
@@ -2964,10 +2964,13 @@ class CudaKernelGenerator : private kir::ConstIrVisitor {
} else {
indent() << "// Alias Allocation (changing dtype) - "
<< alloc->memoryType() << "\n";
auto va = kernel_->summary().vectorized_accesses;
auto it = va.find(tv);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could instead use the original TV, whose vectorized accesses are guaranteed to be at least as large as this 🤷‍♂️

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may not work since the actual alignment size is sizeof(buffer_dtype) * alias_alignment. If we used the alignment of the original tensor, the alignment size would become larger if the size of the data type of the original tensor was smaller.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently only used by "inner aliasing" in which the size of the dtypes must match, as well as the parallelization and shapes. I'll leave it though for the reason you stated since in the future we might possibly allow mismatched shapes and dtype sizes subject to some other constraint.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see. I think what's more important here would be we should make CudaCodeGen as trivial as possible and try to minimize the coupling with the lowering passes. It's difficult to keep track of all the assumptions made in the prior phases of code translations. For this case particular, since the lowering passes the alignment size, that should be just used as is. If some other valid value should be used instead, that decision should be made by the lowering and be recorded in the kernel summary.

Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jacobhinkle jacobhinkle merged commit 2db40b0 into main Oct 2, 2024
35 of 36 checks passed
@jacobhinkle jacobhinkle deleted the alias_alignment branch October 2, 2024 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants