Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Software pipeline] Fix hardcoded index in access_ptr rewriting, add a GPU test with depth 4 #11495

Merged
merged 8 commits into from
May 28, 2022

Conversation

masahi
Copy link
Member

@masahi masahi commented May 27, 2022

Fix a hardcoded index in access_ptr rewriting, which assumes that the number of stages is 2.

Refactored MMA code in test_tir_schedule_tensorize_ldmatrix_mma.py, so that it can be used by other tests. The new test in test_tir_transform_inject_software_pipeline.py applies software pipelining annotations to the MMA-tensorized schedule with software_pipeline_stage = [0, 0, 3], which makes global to shared load pipelined with depth 4. Without async copy, this is not useful for performance. But it does demonstrate that a multi-stage pipeline with depth > 2 works on a semi-realistic GPU schedule.

The test uses large dynamic shared memory, which serves as a test case for #11478.

@vinx13 @junrushao1994 @csullivan

@masahi masahi changed the title [Software pipeline] Fix hardcoded index in access_ptr rewriting, add a GPU test with depth 3 [Software pipeline] Fix hardcoded index in access_ptr rewriting, add a GPU test with depth 4 May 27, 2022
@masahi masahi force-pushed the software-pipe-index-fix branch from b0b3a40 to 853a128 Compare May 27, 2022 20:36
@masahi masahi merged commit 2389f1f into apache:main May 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants