Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing performance issue in PassUpDomain when fusing and splitting axes #3073

Merged

Conversation

bulanova-huawei
Copy link
Contributor

This fixes #3072

Currently bounding box for fused parameter in PassUpDomain is the full extent of outer and inner dimensions of the output tensor.
Proposed change computes tight bounds for outer dimension. If fused IntSet always fits inside inner dimension, then tight bound for inner also can be computed. Otherwise, full extent of inner is used.

For example, if 2 axes of a 2d tensor with shape (12, 6) are fused and split with factor 12, then outer and inner bounds will have dimension (2, 6). If the tensor is split with factor 9, bounds will be the same, because 9 elements wrap around 2 rows of (12, 6), and some redundant computation will occur. If split factor is 3, we can again avoid redundant computation by using (1, 3) shape.

@bulanova-huawei bulanova-huawei changed the title Fixing performance issue in PassUpDomain when fusing and splitting axes [WIP] Fixing performance issue in PassUpDomain when fusing and splitting axes Apr 22, 2019
@bulanova-huawei
Copy link
Contributor Author

There is a problem related to #1139:

import tvm
m = tvm.convert(12)
l = tvm.convert(6)
A = tvm.placeholder((m, l), name='A')
A1 = tvm.compute((m, l), lambda i, j: A[i, j], name='A1')
A2 = tvm.compute((m, l), lambda i, j: A1[i, j] + 3, name='A2')
s = tvm.create_schedule(A2.op)

fused_axes = s[A2].fuse(A2.op.axis[0], A2.op.axis[1])
xo, xi = s[A2].split(fused_axes, 10)
s[A1].compute_at(s[A2], xo)

print(tvm.lower(s, [A, A2], simple_mode=True))

produces:

// attr [A1] storage_scope = "global"
allocate A1[float32 * (((((i.j.fused.outer*10) % 6) + 15)/6)*6)]
produce A2 {
  for (i.j.fused.outer, 0, 8) {
    produce A1 {
      for (i, 0, ((((i.j.fused.outer*10) % 6) + 15)/6)) {
        for (j, 0, 6) {
          if (likely((((i.j.fused.outer*10)/6) < (12 - i)))) {
            A1[((i*6) + j)] = A[(((((i.j.fused.outer*10)/6) + i)*6) + j)]
          }
        }
      }
    }
    for (i.j.fused.inner, 0, 10) {
      if (likely(((i.j.fused.outer*10) < (72 - i.j.fused.inner)))) {
        A2[((i.j.fused.outer*10) + i.j.fused.inner)] = (A1[(((i.j.fused.outer*10) + i.j.fused.inner) - (((i.j.fused.outer*10)/6)*6))] + 3.000000f)
      }
    }
  }
}

Allocate size depends on the loop variable.

With simple_mode=False the code throws an error.

Copy link

@drcut drcut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can help a lot for reducing unnecessary computing. And the new bound rule is correct and tight. I think it can be merged into master as long as it solve the problem of allocate generated depended on block/thread size.

@bulanova-huawei
Copy link
Contributor Author

I think this should work if #1139 gets fixed.
If that one is not a priority, I had some thoughts on how to deal with it, but I wanted to get some input from the community. Here is the post: https://discuss.tvm.ai/t/problem-with-allocate-size-depending-on-loop-variable-while-trying-to-improve-bounds-in-passupdomain/2356

@tqchen @wweic

@tqchen
Copy link
Member

tqchen commented Jul 10, 2019

sorry for the delayed action on this. @bulanova-huawei can you rebase against the master and update the PR? Please also make sure CI passes

@tqchen tqchen self-assigned this Jul 10, 2019
@bulanova-huawei
Copy link
Contributor Author

sorry for the delayed action on this. @bulanova-huawei can you rebase against the master and update the PR? Please also make sure CI passes

@tqchen I added a CI test that demonstrates the problem with alloc size depending on a loop variable before it is defined. I can remove the second commit with this test, but the problem will still be there.

@tqchen
Copy link
Member

tqchen commented Jul 11, 2019

We need to make sure CI is always green, so if there will be further fixes that solves future problems, we should keep that in a separate PR

@bulanova-huawei bulanova-huawei force-pushed the tighten_bounding_box_in_PassUpDomain branch from dd6c970 to 9ee7dbd Compare July 12, 2019 03:14
@bulanova-huawei bulanova-huawei changed the title [WIP] Fixing performance issue in PassUpDomain when fusing and splitting axes Fixing performance issue in PassUpDomain when fusing and splitting axes Jul 12, 2019
@bulanova-huawei
Copy link
Contributor Author

Please review @merrymercy @sgrechanik-h

@bulanova-huawei bulanova-huawei force-pushed the tighten_bounding_box_in_PassUpDomain branch from 9ee7dbd to 4f3a032 Compare July 13, 2019 03:42
Copy link
Contributor

@wweic wweic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! lgtm. just cosmetic comment.

src/schedule/message_passing.cc Outdated Show resolved Hide resolved
tests/python/unittest/test_schedule_bound_inference.py Outdated Show resolved Hide resolved
@tqchen
Copy link
Member

tqchen commented Jul 14, 2019

@bulanova-huawei can you rebase again? Sorry about the request but there is a conflict we need to resove wrt to the latest merge

Apply suggestions from code review

Co-Authored-By: Wei Chen <[email protected]>
@bulanova-huawei bulanova-huawei force-pushed the tighten_bounding_box_in_PassUpDomain branch from 469dad0 to 010f217 Compare July 14, 2019 23:32
@tqchen tqchen merged commit 54f903a into apache:master Jul 18, 2019
@tqchen
Copy link
Member

tqchen commented Jul 18, 2019

Thanks @bulanova-huawei @wweic , this PR is now merged

wweic added a commit to wweic/tvm that referenced this pull request Aug 9, 2019
Apply suggestions from code review

Co-Authored-By: Wei Chen <[email protected]>
wweic added a commit to neo-ai/tvm that referenced this pull request Sep 6, 2019
Apply suggestions from code review

Co-Authored-By: Wei Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance issue in PassUpDomain when fusing and splitting axes.
4 participants