Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alias allocations with same size elements despite different dtypes #665

Merged
merged 5 commits into from
Aug 1, 2023

Conversation

jacobhinkle
Copy link
Collaborator

@jacobhinkle jacobhinkle commented Jul 31, 2023

Currently the reuseMemoryAllocations lowering pass reuses TensorViews if they have compatible shapes and vectorization, and the same dtype. This PR replaces the dtype condition with a check that the size of each element is the same.

It is somewhat rare in our test suite that we find opportunities to re-use one buffer with another of a different type, but it does happen. For example, one case in FusionWelfordShmoo_CUDA corresponding to testWelford(DataType::ComplexFloat, 1, 320, 256); currently shows:

ptxas info    : Used 61 registers, 16 bytes smem, 464 bytes cmem[0]

With the current PR, this becomes

ptxas info    : Used 59 registers, 16 bytes smem, 464 bytes cmem[0]

since we re-use two register TensorViews that change from int64_t to std::complex<float>.

Re-using shared memory in these cases is likely always beneficial, but the effect of explicitly re-using registers is hard to analyze since the compiler pipeline is so complicated. It is likely rarely harmful, so this behavior is on by default. However, it can be disabled using DisableOption::ReuseMismatchedTypeRegisters or NVFUSER_DISABLE=reuse_mismatched_type_registers.

@jacobhinkle
Copy link
Collaborator Author

Assuming it is safe to do this, I am a bit surprised NVRTC doesn't find this opportunity by itself.

@jacobhinkle
Copy link
Collaborator Author

!build

@zasdfgbnm
Copy link
Collaborator

zasdfgbnm commented Jul 31, 2023

Are you saying my claim in #221 about "reusing registers are useless" is wrong? This is 🤯.

@jacobhinkle
Copy link
Collaborator Author

Are you saying my claim in #221 about "reusing registers are useless" is wrong? This is 🤯.

I think so! I was planning to try this for smem tensors then noticed it has an effect for registers. Maybe I am missing some reason this is unsafe so it is not done automatically.

@jacobhinkle jacobhinkle marked this pull request as ready for review July 31, 2023 20:44
@jacobhinkle jacobhinkle merged commit b1477f2 into main Aug 1, 2023
@jacobhinkle jacobhinkle deleted the alias_different_dtypes branch August 1, 2023 11:45
jacobhinkle added a commit that referenced this pull request Oct 2, 2024
See #2934 (comment)

PR #665 allowed us to re-use allocations that have different dtypes. We
already check that our aliased tensors do not have vectorized accesses
larger than those of the original tensors. However, when we have
different dtypes we `reinterpret_cast` it to a different `Array` type.
Previously we did not specify any alignment in that type's template
args, meaning it assumed an alignment of size 1. Since the actual
addresses will all still be aligned this does not caused misaligned
accesses at runtime. This PR sets the template arg for alignment to be
that of the vectorized access width for the alias tensor, so that the
compiler could hypothetically do some optimizations knowing the address
is aligned.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants