Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue in PassUpDomain when fusing and splitting axes. #3072

Closed
bulanova-huawei opened this issue Apr 22, 2019 · 0 comments · Fixed by #3073
Closed

Performance issue in PassUpDomain when fusing and splitting axes. #3072

bulanova-huawei opened this issue Apr 22, 2019 · 0 comments · Fixed by #3073

Comments

@bulanova-huawei
Copy link
Contributor

bulanova-huawei commented Apr 22, 2019

PassUpDomain estimates a bounding box for IntSet conservatively when axes are first fused, and then split.
Code:

import tvm
m = tvm.convert(12)
l = tvm.convert(6)
A = tvm.placeholder((m, l), name='A')
A1 = tvm.compute((m, l), lambda i, j: A[i, j], name='A1')
A2 = tvm.compute((m, l), lambda i, j: A1[i, j] + 3, name='A2')

s = tvm.create_schedule(A2.op)
fused_axes = s[A2].fuse(A2.op.axis[0], A2.op.axis[1])
xo, xi = s[A2].split(fused_axes, 12)
s[A1].compute_at(s[A2], xo)


print(tvm.lower(s, [A, A1, A2], simple_mode=True))

produces

produce A2 {
  for (i.j.fused.outer, 0, 6) {
    produce A1 {
      for (i, 0, 12) {
        for (j, 0, 6) {
          A1[((i*6) + j)] = A[((i*6) + j)]
        }
      }
    }
    for (i.j.fused.inner, 0, 12) {
      A2[((i.j.fused.outer*12) + i.j.fused.inner)] = (A1[((i.j.fused.outer*12) + i.j.fused.inner)] + 3.000000f)
    }
  }
}

Note that the whole tensor A1 is realized at each iteration of i.j.fused.outer.
More efficient would be:

produce A2 {
  for (i.j.fused.outer, 0, 6) {
    produce A1 {
      for (i, 0, 2) {
        for (j, 0, 6) {
          A1[((((i.j.fused.outer*2) + i)*6) + j)] = A[((((i.j.fused.outer*2) + i)*6) + j)]
        }
      }
    }
    for (i.j.fused.inner, 0, 12) {
      A2[((i.j.fused.outer*12) + i.j.fused.inner)] = (A1[((i.j.fused.outer*12) + i.j.fused.inner)] + 3.000000f)
    }
  }
}

Related discussions:
https://discuss.tvm.ai/t/discuss-contributing-new-docs-for-inferbound/2151/9
https://discuss.tvm.ai/t/tensorize-which-use-case-is-correct/2140/4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant