Fixing performance issue in PassUpDomain when fusing and splitting axes #3073

bulanova-huawei · 2019-04-22T21:38:14Z

This fixes #3072

Currently bounding box for fused parameter in PassUpDomain is the full extent of outer and inner dimensions of the output tensor.
Proposed change computes tight bounds for outer dimension. If fused IntSet always fits inside inner dimension, then tight bound for inner also can be computed. Otherwise, full extent of inner is used.

For example, if 2 axes of a 2d tensor with shape (12, 6) are fused and split with factor 12, then outer and inner bounds will have dimension (2, 6). If the tensor is split with factor 9, bounds will be the same, because 9 elements wrap around 2 rows of (12, 6), and some redundant computation will occur. If split factor is 3, we can again avoid redundant computation by using (1, 3) shape.

bulanova-huawei · 2019-04-23T16:25:35Z

There is a problem related to #1139:

import tvm
m = tvm.convert(12)
l = tvm.convert(6)
A = tvm.placeholder((m, l), name='A')
A1 = tvm.compute((m, l), lambda i, j: A[i, j], name='A1')
A2 = tvm.compute((m, l), lambda i, j: A1[i, j] + 3, name='A2')
s = tvm.create_schedule(A2.op)

fused_axes = s[A2].fuse(A2.op.axis[0], A2.op.axis[1])
xo, xi = s[A2].split(fused_axes, 10)
s[A1].compute_at(s[A2], xo)

print(tvm.lower(s, [A, A2], simple_mode=True))

produces:

// attr [A1] storage_scope = "global"
allocate A1[float32 * (((((i.j.fused.outer*10) % 6) + 15)/6)*6)]
produce A2 {
  for (i.j.fused.outer, 0, 8) {
    produce A1 {
      for (i, 0, ((((i.j.fused.outer*10) % 6) + 15)/6)) {
        for (j, 0, 6) {
          if (likely((((i.j.fused.outer*10)/6) < (12 - i)))) {
            A1[((i*6) + j)] = A[(((((i.j.fused.outer*10)/6) + i)*6) + j)]
          }
        }
      }
    }
    for (i.j.fused.inner, 0, 10) {
      if (likely(((i.j.fused.outer*10) < (72 - i.j.fused.inner)))) {
        A2[((i.j.fused.outer*10) + i.j.fused.inner)] = (A1[(((i.j.fused.outer*10) + i.j.fused.inner) - (((i.j.fused.outer*10)/6)*6))] + 3.000000f)
      }
    }
  }
}

Allocate size depends on the loop variable.

With simple_mode=False the code throws an error.

drcut

It can help a lot for reducing unnecessary computing. And the new bound rule is correct and tight. I think it can be merged into master as long as it solve the problem of allocate generated depended on block/thread size.

bulanova-huawei · 2019-06-27T20:32:31Z

I think this should work if #1139 gets fixed.
If that one is not a priority, I had some thoughts on how to deal with it, but I wanted to get some input from the community. Here is the post: https://discuss.tvm.ai/t/problem-with-allocate-size-depending-on-loop-variable-while-trying-to-improve-bounds-in-passupdomain/2356

@tqchen @wweic

tqchen · 2019-07-10T17:06:52Z

sorry for the delayed action on this. @bulanova-huawei can you rebase against the master and update the PR? Please also make sure CI passes

bulanova-huawei · 2019-07-11T21:31:03Z

sorry for the delayed action on this. @bulanova-huawei can you rebase against the master and update the PR? Please also make sure CI passes

@tqchen I added a CI test that demonstrates the problem with alloc size depending on a loop variable before it is defined. I can remove the second commit with this test, but the problem will still be there.

tqchen · 2019-07-11T21:41:04Z

We need to make sure CI is always green, so if there will be further fixes that solves future problems, we should keep that in a separate PR

bulanova-huawei · 2019-07-12T18:56:43Z

Please review @merrymercy @sgrechanik-h

wweic

thanks! lgtm. just cosmetic comment.

src/schedule/message_passing.cc

tests/python/unittest/test_schedule_bound_inference.py

tqchen · 2019-07-14T19:35:03Z

@bulanova-huawei can you rebase again? Sorry about the request but there is a conflict we need to resove wrt to the latest merge

Apply suggestions from code review Co-Authored-By: Wei Chen <[email protected]>

tqchen · 2019-07-18T00:15:23Z

Thanks @bulanova-huawei @wweic , this PR is now merged

Apply suggestions from code review Co-Authored-By: Wei Chen <[email protected]>

bulanova-huawei changed the title ~~Fixing performance issue in PassUpDomain when fusing and splitting axes~~ [WIP] Fixing performance issue in PassUpDomain when fusing and splitting axes Apr 22, 2019

drcut approved these changes Jun 27, 2019

View reviewed changes

tqchen added the status: need review label Jul 10, 2019

tqchen self-assigned this Jul 10, 2019

bulanova-huawei force-pushed the tighten_bounding_box_in_PassUpDomain branch from dd6c970 to 9ee7dbd Compare July 12, 2019 03:14

bulanova-huawei changed the title ~~[WIP] Fixing performance issue in PassUpDomain when fusing and splitting axes~~ Fixing performance issue in PassUpDomain when fusing and splitting axes Jul 12, 2019

bulanova-huawei closed this Jul 13, 2019

bulanova-huawei reopened this Jul 13, 2019

bulanova-huawei force-pushed the tighten_bounding_box_in_PassUpDomain branch from 9ee7dbd to 4f3a032 Compare July 13, 2019 03:42

wweic requested changes Jul 13, 2019

View reviewed changes

src/schedule/message_passing.cc Outdated Show resolved Hide resolved

tests/python/unittest/test_schedule_bound_inference.py Outdated Show resolved Hide resolved

wweic approved these changes Jul 14, 2019

View reviewed changes

tightening bounding box for IntSet fused in PassUpDomain

010f217

Apply suggestions from code review Co-Authored-By: Wei Chen <[email protected]>

bulanova-huawei force-pushed the tighten_bounding_box_in_PassUpDomain branch from 469dad0 to 010f217 Compare July 14, 2019 23:32

tqchen merged commit 54f903a into apache:master Jul 18, 2019

tqchen added status: accepted and removed status: need review labels Jul 18, 2019

merrymercy mentioned this pull request Jul 31, 2019

compute_at after fused & split,the result is wrong #3679

Closed

wweic added a commit to wweic/tvm that referenced this pull request Aug 9, 2019

tightening bounding box for IntSet fused in PassUpDomain (apache#3073)

fe98c4f

Apply suggestions from code review Co-Authored-By: Wei Chen <[email protected]>

wweic added a commit to neo-ai/tvm that referenced this pull request Sep 6, 2019

tightening bounding box for IntSet fused in PassUpDomain (apache#3073)

a4e4c01

Apply suggestions from code review Co-Authored-By: Wei Chen <[email protected]>

yzhliu mentioned this pull request Nov 11, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing performance issue in PassUpDomain when fusing and splitting axes #3073

Fixing performance issue in PassUpDomain when fusing and splitting axes #3073

bulanova-huawei commented Apr 22, 2019

bulanova-huawei commented Apr 23, 2019

drcut left a comment

bulanova-huawei commented Jun 27, 2019

tqchen commented Jul 10, 2019

bulanova-huawei commented Jul 11, 2019

tqchen commented Jul 11, 2019

bulanova-huawei commented Jul 12, 2019

wweic left a comment

tqchen commented Jul 14, 2019

tqchen commented Jul 18, 2019

Fixing performance issue in PassUpDomain when fusing and splitting axes #3073

Fixing performance issue in PassUpDomain when fusing and splitting axes #3073

Conversation

bulanova-huawei commented Apr 22, 2019

bulanova-huawei commented Apr 23, 2019

drcut left a comment

Choose a reason for hiding this comment

bulanova-huawei commented Jun 27, 2019

tqchen commented Jul 10, 2019

bulanova-huawei commented Jul 11, 2019

tqchen commented Jul 11, 2019

bulanova-huawei commented Jul 12, 2019

wweic left a comment

Choose a reason for hiding this comment

tqchen commented Jul 14, 2019

tqchen commented Jul 18, 2019