-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Fix][TIR] LowerThreadAllreduce warp reduction mask
The warp reduction implemented by "shuffle down" primitive takes a mask denoting the active threads within the warp that participate in this shuffle. Previously we compute the mask, while in practice we find that it results in "CUDA illegal instruction" error on NVIDIA H100 GPU when the mask is set, and the issue is gone if we do not update the mask. Therefore, this PR updates the allreduce lowering to remove the mask update. Confirmed the correctness on the following devices: * NVIDIA H100, * NVIDIA RTX 4090, * AMD Radeon 7900 XTX, * Apple M2 Ultra.
- Loading branch information
1 parent
99defd2
commit b04fcf1
Showing
2 changed files
with
4 additions
and
18 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters