[Fix] Fix IndexDataTypeNormalizer to avoid redundant casting #13449

MasterJH5574 · 2022-11-21T00:38:40Z

This PR fixes the behavior of IndexDataTypeNormalizer on CastNode.

Background

Consider the following case,

A = te.placeholder((tir.IntImm("int64", 2), tir.IntImm("int64", 4)), name="A")
B = topi.reshape(A, (4, 2))
func = te.create_prim_func([A, B], index_dtype_override=None)

the generated PrimFunc is

@T.prim_func
def func(A: T.Buffer[(T.int64(2), T.int64(4)), "float32"], T_reshape: T.Buffer[(4, 2), "float32"]):
    for i0, i1 in T.grid(4, 2):
        with T.block("T_reshape"):
            ax0, ax1 = T.axis.remap("SS", [i0, i1])
            T.reads(A[(T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(8) // T.int64(4), (T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(4)])
            T.writes(T_reshape[ax0, ax1])
            T_reshape[ax0, ax1] = A[(T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(8) // T.int64(4), (T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(4)]

Here loop variables ax0 and ax1 have dtype int32, since the shape of the output buffer is in int32. Other other hand, the input buffer has shape in int64. So as the script above shows, CreatePrimFunc will cast the int32 variables to int64 first, and access the input buffer afterwards.

Now if we use the option index_dtype_override to specify an index dtype as below,

func = te.create_prim_func([A, B], index_dtype_override="int64")

the generated function will be

@T.prim_func
def func(A: T.Buffer[(T.int64(2), T.int64(4)), "float32"], T_reshape: T.Buffer[(T.int64(4), T.int64(2)), "float32"]):
    for i0, i1 in T.grid(T.int64(4), T.int64(2)):
        with T.block("T_reshape"):
            ax0, ax1 = T.axis.remap("SS", [i0, i1])
            T.reads(A[(T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(8) // T.int64(4), (T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(4)])
            T.writes(T_reshape[ax0, ax1])
            T_reshape[ax0, ax1] = A[(T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(8) // T.int64(4), (T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(4)]

Note that though all variables and the buffer shapes have dtype int64, there are still CastNodes such as T.Cast("int64", ax0) when ax0 is already an int64 variable. We don’t want such redundant casting.

Fix

To fix the issue above, this PR overrides the VisitExpr_(const CastNode* cast) method in IndexDataTypeNormalizer. When the value field of a CastNode already has the target dtype, we no longer cast it.

cc @vinx13 @junrushao

tvm-bot · 2022-11-21T00:38:43Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

No users to tag found in teams: fix _{See #10317 for details}
Built docs for commit 711fd3c can be found here.

_{Generated by tvm-bot}

src/te/operation/create_primfunc.cc

junrushao · 2022-11-21T00:40:36Z

CC: @vinx13 for confirmation

…13449) This PR fixes the behavior of IndexDataTypeNormalizer on CastNode. ## Background Consider the following case, ```python A = te.placeholder((tir.IntImm("int64", 2), tir.IntImm("int64", 4)), name="A") B = topi.reshape(A, (4, 2)) func = te.create_prim_func([A, B], index_dtype_override=None) ``` the generated PrimFunc is ```python @T.prim_func def func(A: T.Buffer[(T.int64(2), T.int64(4)), "float32"], T_reshape: T.Buffer[(4, 2), "float32"]): for i0, i1 in T.grid(4, 2): with T.block("T_reshape"): ax0, ax1 = T.axis.remap("SS", [i0, i1]) T.reads(A[(T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(8) // T.int64(4), (T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(4)]) T.writes(T_reshape[ax0, ax1]) T_reshape[ax0, ax1] = A[(T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(8) // T.int64(4), (T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(4)] ``` Here loop variables `ax0` and `ax1` have dtype int32, since the shape of the output buffer is in int32. Other other hand, the input buffer has shape in int64. So as the script above shows, CreatePrimFunc will cast the int32 variables to int64 first, and access the input buffer afterwards. Now if we use the option `index_dtype_override` to specify an index dtype as below, ```python func = te.create_prim_func([A, B], index_dtype_override="int64") ``` the generated function will be ```python @T.prim_func def func(A: T.Buffer[(T.int64(2), T.int64(4)), "float32"], T_reshape: T.Buffer[(T.int64(4), T.int64(2)), "float32"]): for i0, i1 in T.grid(T.int64(4), T.int64(2)): with T.block("T_reshape"): ax0, ax1 = T.axis.remap("SS", [i0, i1]) T.reads(A[(T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(8) // T.int64(4), (T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(4)]) T.writes(T_reshape[ax0, ax1]) T_reshape[ax0, ax1] = A[(T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(8) // T.int64(4), (T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(4)] ``` Note that though all variables and the buffer shapes have dtype int64, there are still CastNodes such as `T.Cast("int64", ax0)` when `ax0` is already an int64 variable. We don’t want such redundant casting. ## Fix To fix the issue above, this PR overrides the `VisitExpr_(const CastNode* cast)` method in IndexDataTypeNormalizer. When the `value` field of a CastNode already has the target dtype, we no longer cast it.

…pache#13449)" This reverts commit d663207.

…casting (apache#13449)"" This reverts commit fc76ea1.

…pache#13449)" This reverts commit d663207.

…casting (apache#13449)"" This reverts commit fc76ea1.

…pache#13449)" This reverts commit d663207.

…casting (apache#13449)"" This reverts commit fc76ea1.

…tvm#13449) This PR fixes the behavior of IndexDataTypeNormalizer on CastNode. Consider the following case, ```python A = te.placeholder((tir.IntImm("int64", 2), tir.IntImm("int64", 4)), name="A") B = topi.reshape(A, (4, 2)) func = te.create_prim_func([A, B], index_dtype_override=None) ``` the generated PrimFunc is ```python @T.prim_func def func(A: T.Buffer[(T.int64(2), T.int64(4)), "float32"], T_reshape: T.Buffer[(4, 2), "float32"]): for i0, i1 in T.grid(4, 2): with T.block("T_reshape"): ax0, ax1 = T.axis.remap("SS", [i0, i1]) T.reads(A[(T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(8) // T.int64(4), (T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(4)]) T.writes(T_reshape[ax0, ax1]) T_reshape[ax0, ax1] = A[(T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(8) // T.int64(4), (T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(4)] ``` Here loop variables `ax0` and `ax1` have dtype int32, since the shape of the output buffer is in int32. Other other hand, the input buffer has shape in int64. So as the script above shows, CreatePrimFunc will cast the int32 variables to int64 first, and access the input buffer afterwards. Now if we use the option `index_dtype_override` to specify an index dtype as below, ```python func = te.create_prim_func([A, B], index_dtype_override="int64") ``` the generated function will be ```python @T.prim_func def func(A: T.Buffer[(T.int64(2), T.int64(4)), "float32"], T_reshape: T.Buffer[(T.int64(4), T.int64(2)), "float32"]): for i0, i1 in T.grid(T.int64(4), T.int64(2)): with T.block("T_reshape"): ax0, ax1 = T.axis.remap("SS", [i0, i1]) T.reads(A[(T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(8) // T.int64(4), (T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(4)]) T.writes(T_reshape[ax0, ax1]) T_reshape[ax0, ax1] = A[(T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(8) // T.int64(4), (T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(4)] ``` Note that though all variables and the buffer shapes have dtype int64, there are still CastNodes such as `T.Cast("int64", ax0)` when `ax0` is already an int64 variable. We don’t want such redundant casting. To fix the issue above, this PR overrides the `VisitExpr_(const CastNode* cast)` method in IndexDataTypeNormalizer. When the `value` field of a CastNode already has the target dtype, we no longer cast it.

This PR fixes the behavior of IndexDataTypeNormalizer on CastNode. Consider the following case, ```python A = te.placeholder((tir.IntImm("int64", 2), tir.IntImm("int64", 4)), name="A") B = topi.reshape(A, (4, 2)) func = te.create_prim_func([A, B], index_dtype_override=None) ``` the generated PrimFunc is ```python @T.prim_func def func(A: T.Buffer[(T.int64(2), T.int64(4)), "float32"], T_reshape: T.Buffer[(4, 2), "float32"]): for i0, i1 in T.grid(4, 2): with T.block("T_reshape"): ax0, ax1 = T.axis.remap("SS", [i0, i1]) T.reads(A[(T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(8) // T.int64(4), (T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(4)]) T.writes(T_reshape[ax0, ax1]) T_reshape[ax0, ax1] = A[(T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(8) // T.int64(4), (T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(4)] ``` Here loop variables `ax0` and `ax1` have dtype int32, since the shape of the output buffer is in int32. Other other hand, the input buffer has shape in int64. So as the script above shows, CreatePrimFunc will cast the int32 variables to int64 first, and access the input buffer afterwards. Now if we use the option `index_dtype_override` to specify an index dtype as below, ```python func = te.create_prim_func([A, B], index_dtype_override="int64") ``` the generated function will be ```python @T.prim_func def func(A: T.Buffer[(T.int64(2), T.int64(4)), "float32"], T_reshape: T.Buffer[(T.int64(4), T.int64(2)), "float32"]): for i0, i1 in T.grid(T.int64(4), T.int64(2)): with T.block("T_reshape"): ax0, ax1 = T.axis.remap("SS", [i0, i1]) T.reads(A[(T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(8) // T.int64(4), (T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(4)]) T.writes(T_reshape[ax0, ax1]) T_reshape[ax0, ax1] = A[(T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(8) // T.int64(4), (T.Cast("int64", ax0) * T.int64(2) + T.Cast("int64", ax1)) % T.int64(4)] ``` Note that though all variables and the buffer shapes have dtype int64, there are still CastNodes such as `T.Cast("int64", ax0)` when `ax0` is already an int64 variable. We don’t want such redundant casting. To fix the issue above, this PR overrides the `VisitExpr_(const CastNode* cast)` method in IndexDataTypeNormalizer. When the `value` field of a CastNode already has the target dtype, we no longer cast it.

github-actions bot requested review from junrushao and vinx13 November 21, 2022 00:39

junrushao reviewed Nov 21, 2022

View reviewed changes

src/te/operation/create_primfunc.cc Outdated Show resolved Hide resolved

[Fix] Fix IndexDataTypeNormalizer to avoid redundant casting

711fd3c

MasterJH5574 force-pushed the bugfix/2022-11-20-index-dtype-normalizer branch from 48044ce to 711fd3c Compare November 21, 2022 00:40

vinx13 approved these changes Nov 21, 2022

View reviewed changes

junrushao merged commit d663207 into apache:main Nov 21, 2022

masahi added a commit to masahi/tvm that referenced this pull request Nov 25, 2022

Revert "[Fix] Fix IndexDataTypeNormalizer to avoid redundant casting (a…

a8326e7

…pache#13449)" This reverts commit d663207.

masahi added a commit to masahi/tvm that referenced this pull request Nov 25, 2022

Revert "[Fix] Fix IndexDataTypeNormalizer to avoid redundant casting (a…

17f0ade

…pache#13449)" This reverts commit d663207.

masahi added a commit to masahi/tvm that referenced this pull request Nov 29, 2022

Revert "[Fix] Fix IndexDataTypeNormalizer to avoid redundant casting (a…

fc76ea1

…pache#13449)" This reverts commit d663207.

masahi added a commit to masahi/tvm that referenced this pull request Nov 29, 2022

Revert "Revert "[Fix] Fix IndexDataTypeNormalizer to avoid redundant …

2d58789

…casting (apache#13449)"" This reverts commit fc76ea1.

masahi added a commit to masahi/tvm that referenced this pull request Dec 1, 2022

Revert "[Fix] Fix IndexDataTypeNormalizer to avoid redundant casting (a…

4af0786

…pache#13449)" This reverts commit d663207.

masahi added a commit to masahi/tvm that referenced this pull request Dec 1, 2022

Revert "Revert "[Fix] Fix IndexDataTypeNormalizer to avoid redundant …

637038d

…casting (apache#13449)"" This reverts commit fc76ea1.

masahi added a commit to masahi/tvm that referenced this pull request Dec 2, 2022

Revert "[Fix] Fix IndexDataTypeNormalizer to avoid redundant casting (a…

3e3a160

…pache#13449)" This reverts commit d663207.

masahi added a commit to masahi/tvm that referenced this pull request Dec 2, 2022

Revert "Revert "[Fix] Fix IndexDataTypeNormalizer to avoid redundant …

a80a252

…casting (apache#13449)"" This reverts commit fc76ea1.

MasterJH5574 mentioned this pull request Jan 11, 2023

[Cherry-Pick][Fix] Fix IndexDataTypeNormalizer (apache/tvm#13449) tlc-pack/relax#355

Merged

MasterJH5574 mentioned this pull request Jan 15, 2023

[TIR][Fix] IndexDataTypeNormalizer not unwrapping float casting #13789

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Fix IndexDataTypeNormalizer to avoid redundant casting #13449

[Fix] Fix IndexDataTypeNormalizer to avoid redundant casting #13449

MasterJH5574 commented Nov 21, 2022 •

edited

Loading

tvm-bot commented Nov 21, 2022 •

edited

Loading

junrushao commented Nov 21, 2022

[Fix] Fix IndexDataTypeNormalizer to avoid redundant casting #13449

[Fix] Fix IndexDataTypeNormalizer to avoid redundant casting #13449

Conversation

MasterJH5574 commented Nov 21, 2022 • edited Loading

Background

Fix

tvm-bot commented Nov 21, 2022 • edited Loading

junrushao commented Nov 21, 2022

MasterJH5574 commented Nov 21, 2022 •

edited

Loading

tvm-bot commented Nov 21, 2022 •

edited

Loading