-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Arith][GPU]Rewrite simplify fix for Vectorized Cooperative Fetching #5924
Conversation
Thanks @jcf94 we should add a testcase to test_arith_rewrite_simplify, by constructing the case and
|
return broadcast(floordiv(b1, c2), lanes).Eval(); | ||
} | ||
// If all indices can be guaranteed to settle inside a coeff range | ||
if (c2val % bmod->coeff == 0 && bmod->base + (lanes.Eval() - 1) * c1val < bmod->coeff) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a unit test in tests_arith_rewrite_simplify to cover this rule.
Commemts are all addressed.
|
@jcf94 Did our old rule affect the correctness of common operators? |
Yes, with those rules several other UTs will fail.
Our rules make it to be
|
Thanks @jcf94 @merrymercy . this PR is now merged |
This pr is part of #5883 , fix for the rewrite_simplify error when doing vectorized cooperative fetching in some cases.
Code generated with bug is shown like this:
Which will finally lower to wrong CUDA C instructions.
This should be simplified to generate the correct RampNode:
Then main problems inside this expression are:
should be simplified to:
@merrymercy Our former simplify rules will cause some extra bug, I find a better way to fix this.
@tqchen For the UTs, I'm not sure if there's any better way to check if all of the inner AST blocks are RampNode, the current UTs I added can still pass even if the vectorize failed.
cc @minminsun @FrozenGene @yangjunpro