Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Question Regarding To The Use Of Swizzle #1927

Open
Yanksi opened this issue Nov 7, 2024 · 1 comment
Open

[QST] Question Regarding To The Use Of Swizzle #1927

Yanksi opened this issue Nov 7, 2024 · 1 comment

Comments

@Yanksi
Copy link

Yanksi commented Nov 7, 2024

When I was running the (code example)[https://github.com/user-attachments/files/17388059/sgemm_sm80_tmp.txt] provided by @ccecka in another issue, I got cudaDeviceSynchronize(): cudaErrorLaunchFailure: unspecified launch failure when running under the "TN" set up. With a bit of investigation, I found that the problem seems to occur due to the Swizzle layout used by the code.

By changing the swizzle_atom to composition(Swizzle<3,3,3>{}, Layout<Shape <_8, _32>, Stride<_32, _1>>{}) and bK to _32, the code runs without the previous mentioned error. However, I am not able to find any other setup that also works. And what it seems to be strange to me is that the value of bK also affect the correctness of the code, when I change bK to _64 while keeping swizzle_atom to be composition(Swizzle<3,3,3>{}, Layout<Shape <_8, _32>, Stride<_32, _1>>{}), the code also failed to work. Shouldn't tile_to_shape handle this kind of situation automatically?

@ccecka
Copy link

ccecka commented Nov 7, 2024

The original example that I provided gave me no errors. In that thread, we discuss all kinds of changes that are possible including swapping out the MMA, changing the SMEM layouts, and modifying the copy patterns.

You can inspect the Tensors/Layouts pre-partitioning and post-partitioning with print, print_tensor, and print_latex to sanity check any concerns. Yes, tile_to_shape should be working fine (and you can verify that by inspecting the output), but we would also need to see the configuration of the TiledMMAs, the TiledCopys, etc. Many static assertions in the original example code to catch common incompatibilities.

Despite our best efforts to check everything statically that can be checked, out-of-bounds accesses and run-time failures can still occur. This can often be due to problem sizes that are not a multiple of the tile sizes -- the example provided does not perform any predication.

Swizzle layouts are primarily a bank access pattern optimization, so if you believe that Swizzle is the problem, then test it for correctness without a Swizzle first. Only then, optimize any SMEM bank conflicts by using a swizzle pattern (and verifying the post-partitioned layouts still make sense).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants