Fix broken loop partitioning due to recent changes. #4243

kimishpatel · 2019-11-01T06:20:16Z

factors and resulting nested loop is broken.
This is due to the fact that we are creating zero extent loops which
are fixed afterwards. However unroll pass breaks due to the zero extent
loop.

Specifically this is what happens:
Recent changes in PR, #3734 , broke some of the loop partitioning logic. Particularly take for example:

  def test_multilevel_splitting_with_indivisble_factors():
      import topi
      A = tvm.placeholder((130,), dtype="float32")
      B = topi.nn.relu(A)
      s = tvm.create_schedule(B.op)
      (y,) = s[B].op.axis
      (yo, yi) = s[B].split(y, factor=8)
      (yoo, yoi) = s[B].split(yo, factor=16)
      s[B].reorder(yoo, yoi, yi)
      #s[B].reorder(yo, yi)
      s[B].unroll(yi)

      with tvm.build_config(partition_const_loop=True):
          lowered_body = tvm.lower(s, [A, B]).body
          print("-----------lowerd body----------")
          print(lowered_body)
          print("-----------lowerd body----------")
          def visit_stmt(op):
              return(isinstance(op, tvm.expr.Max))
          num_max = collect_visit(lowered_body, visit_stmt)
          assert num_max.count(True) == 10

This fails mainly due to removal of the conditions such as:
https://github.com/dmlc/tvm/pull/3734/files#diff-4d2032b94673aac6728b304262c5b4a6L516
The issue is that, now we can potentially create zero extent post and pre loop bodies. If we have nested loops as a result of loop partitioning then, the zero extent loops can have nested loop which are further partitioned. In this particular example the nested loop is generated with outer loop's extent of 0.
This results in inner loop: unrolled (i0.inner, 0, min(((0 - (i0.outer.inner*8)) - 6), 8)). With this unroll pass breaks with the following error:
TVMError: Check failed: value >= 0 (-1 vs. 0) : Cannot unroll non-constant loop

However if we factor the loop only once things work fine. So the following example works.

  def test_multilevel_splitting_with_indivisble_factors():
      import topi
      A = tvm.placeholder((130,), dtype="float32")
      B = topi.nn.relu(A)
      s = tvm.create_schedule(B.op)
      (y,) = s[B].op.axis
      (yo, yi) = s[B].split(y, factor=8)
      #(yoo, yoi) = s[B].split(yo, factor=16)
      #s[B].reorder(yoo, yoi, yi)
      s[B].reorder(yo, yi)
      s[B].unroll(yi)

      with tvm.build_config(partition_const_loop=True):
          lowered_body = tvm.lower(s, [A, B]).body
          print("-----------lowerd body----------")
          print(lowered_body)
          print("-----------lowerd body----------")
          def visit_stmt(op):
              return(isinstance(op, tvm.expr.Max))
          num_max = collect_visit(lowered_body, visit_stmt)
          assert num_max.count(True) == 10

I suggest we revert changes such that we no longer generate post and pre loops of zero extent.
This PR does that.

kimishpatel · 2019-11-01T06:21:26Z

CC: @umangyadav @ZihengJiang @tqchen.

umangyadav · 2019-11-01T12:40:48Z

@kimishpatel I understand the point and it is completely valid.

I suggest an alternate way, that is how about adding a call to RemoveNoOp pass in the VisitAndMutate here? That is before recursing, we clean up the zero loops?
Or other way maybe we can do both, keeping the guard conditions and having RemoveNoOp?

https://github.com/dmlc/tvm/blob/a3ca1a4da071f3aadffeafa6d22b43d5addde191/src/pass/loop_partition.cc#L315

If we use CanProve(body_begin == min) and if it fails, then we failed to "prove" the equality and we have not eliminated the possibility of zero extent loops. So, there is still the possibility of recursing on zero extent loops.

umangyadav · 2019-11-01T13:12:44Z

src/pass/loop_partition.cc

@@ -513,17 +513,19 @@ Stmt LoopPartitioner::TryPartition(const Node* node,
  bool pre_stmt_recurse = true;
  if (middle_interval_i->HasLowerBound()) {
    body_begin = ir::Simplify(middle_interval.min());
-    Expr cond = (body_begin - min >= 0);


@kimishpatel
we can also think about changing this condition to body_begin - min > 0 instead of using >= that can also prevent recursing on zero extent loops.

@umangyadav, we can do this one, however we still generate zero extent loop. Just that we dont recursively partition anything generated for zero extent loop.
I still prefer to generate cleaner output that has no such zero extent loops that need fixing afterwards, unless somehow we fix it in the loop partition itself. This can be done as you suggested via RemoveNoOp however it seems little convoluted way to go about this. I would rather have loop partition generate clean IR with valid loop iterations.
Is there a reason why you would prefer to generate zero extent loop?

@kimishpatel There is no specific reason or preference for generating the zero extent loops.

My point is that there is always possibility of having the zero extent loops.

We need to make sure not to recurse on those loops by having some mechanism e.g either guard conditions, RemoveNoOp etc.

@umangyadav, I understand. Do you have specific example where we can still generate zero extent loops despite this PR? Then it will be easier to see what would be a better fix.

@umangyadav I agree we might need a cleanup pass. But if we can avoid generating zero extent loops, it's desirable.

umangyadav · 2019-11-01T13:39:42Z

Another thing we can add is a call to Simplify with range info in the VisitAndMutate to before RemoveNoOp to make the job of later one easier.
e.g

Stmt VisitAndMutate(Stmt& stmt) {
 stmt = Simplify(stmt);
 stmt = RemoveNoOp(stmt);
 selector.visit(stmt);
return Mutate(stmt);
}

kimishpatel · 2019-11-05T05:49:27Z

@ZihengJiang ping.

ZihengJiang · 2019-11-08T00:17:11Z

Hi @kimishpatel , the #3734 is for avoiding generating the tail loop with extent one which is mentioned in #3733. Do you think whether we have way to solve the zero-extent loop problem while keeping avoiding the extent one tail loop issue? Also, it would be great if you can post the printing result of your example.

kimishpatel · 2019-11-08T06:38:39Z

@ZihengJiang, this PR keeps the fixes for tails loop from #3734. The changes here https://github.com/apache/incubator-tvm/pull/3734/files#diff-4d2032b94673aac6728b304262c5b4a6R544, for post_doubt_begin is still preserved. The only thing changes is about generating zero extent loop or not.
I will post the output of the script I mentioned. Without the fix of this PR it fails in the unroll_loop pass where the unroll extent is determined to be -1. I will post the output with the fix.

kimishpatel · 2019-11-08T06:50:56Z

@ZihengJiang
Output without fix:

F1107 22:50:11.396204 3352620 unroll_loop.cc:85] Check failed: value >= 0 (-1 vs. 0) Cannot unroll non-constant loop
*** Check failure stack trace: ***

Output with this PR:

produce compute {
  for (i0.outer.inner, 0, 16) {
    compute[(i0.outer.inner*8)] = max(placeholder[(i0.outer.inner*8)], 0f)
    compute[((i0.outer.inner*8) + 1)] = max(placeholder[((i0.outer.inner*8) + 1)], 0f)
    compute[((i0.outer.inner*8) + 2)] = max(placeholder[((i0.outer.inner*8) + 2)], 0f)
    compute[((i0.outer.inner*8) + 3)] = max(placeholder[((i0.outer.inner*8) + 3)], 0f)
    compute[((i0.outer.inner*8) + 4)] = max(placeholder[((i0.outer.inner*8) + 4)], 0f)
    compute[((i0.outer.inner*8) + 5)] = max(placeholder[((i0.outer.inner*8) + 5)], 0f)
    compute[((i0.outer.inner*8) + 6)] = max(placeholder[((i0.outer.inner*8) + 6)], 0f)
    compute[((i0.outer.inner*8) + 7)] = max(placeholder[((i0.outer.inner*8) + 7)], 0f)
  }
  compute[128] = max(placeholder[128], 0f)
  compute[129] = max(placeholder[129], 0f)
}

ZihengJiang · 2019-11-12T19:07:09Z

@kimishpatel Could you update the PR to re-trigger the CI?

factors and resulting nested loop is broken. This is due to the fact that we are creating zero extent loops which are fixed afterwards. However unroll pass breaks due to the zero extent loop.

ZihengJiang · 2019-11-14T22:56:53Z

Approved. Thanks for contributing! @kimishpatel

…pache#4243) factors and resulting nested loop is broken. This is due to the fact that we are creating zero extent loops which are fixed afterwards. However unroll pass breaks due to the zero extent loop.

apivovarov · 2019-12-04T08:26:14Z

@kimishpatel Conv2d_transpose kernel 2x2, strides (2,2) fails for CUDA - Cannot prove
https://discuss.tvm.ai/t/conv2d-transpose-kernel-2x2-strides-2-2-fails-for-cuda-cannot-prove/5020/1

kimishpatel · 2019-12-04T16:15:19Z

Thanks @apivovarov for bringing this to attention. Will take a look.

umangyadav reviewed Nov 1, 2019

View reviewed changes

tqchen assigned ZihengJiang and unassigned ZihengJiang Nov 1, 2019

tqchen assigned ZihengJiang Nov 6, 2019

This fixes a specific case when loop partitioning with indivisble

8627ad8

factors and resulting nested loop is broken. This is due to the fact that we are creating zero extent loops which are fixed afterwards. However unroll pass breaks due to the zero extent loop.

kimishpatel force-pushed the loop_partition_fix branch from 8918b5f to 8627ad8 Compare November 14, 2019 06:45

ZihengJiang approved these changes Nov 14, 2019

View reviewed changes

ZihengJiang merged commit d58b733 into apache:master Nov 15, 2019

yzhliu mentioned this pull request Nov 22, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

kimishpatel mentioned this pull request Dec 4, 2019

CUDA support pytorch/tvm#143

Merged

apivovarov mentioned this pull request Dec 5, 2019

Conv2d_transpose kernel 2x2, strides (2,2) fails for CUDA - Cannot prove #4470

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix broken loop partitioning due to recent changes. #4243

Fix broken loop partitioning due to recent changes. #4243

kimishpatel commented Nov 1, 2019

kimishpatel commented Nov 1, 2019

umangyadav commented Nov 1, 2019 •

edited

Loading

umangyadav Nov 1, 2019 •

edited

Loading

kimishpatel Nov 2, 2019

umangyadav Nov 4, 2019

kimishpatel Nov 5, 2019

yinghai Nov 6, 2019

umangyadav commented Nov 1, 2019 •

edited

Loading

kimishpatel commented Nov 5, 2019

ZihengJiang commented Nov 8, 2019

kimishpatel commented Nov 8, 2019

kimishpatel commented Nov 8, 2019

ZihengJiang commented Nov 12, 2019

ZihengJiang commented Nov 14, 2019

apivovarov commented Dec 4, 2019 •

edited

Loading

kimishpatel commented Dec 4, 2019

Fix broken loop partitioning due to recent changes. #4243

Fix broken loop partitioning due to recent changes. #4243

Conversation

kimishpatel commented Nov 1, 2019

kimishpatel commented Nov 1, 2019

umangyadav commented Nov 1, 2019 • edited Loading

umangyadav Nov 1, 2019 • edited Loading

Choose a reason for hiding this comment

kimishpatel Nov 2, 2019

Choose a reason for hiding this comment

umangyadav Nov 4, 2019

Choose a reason for hiding this comment

kimishpatel Nov 5, 2019

Choose a reason for hiding this comment

yinghai Nov 6, 2019

Choose a reason for hiding this comment

umangyadav commented Nov 1, 2019 • edited Loading

kimishpatel commented Nov 5, 2019

ZihengJiang commented Nov 8, 2019

kimishpatel commented Nov 8, 2019

kimishpatel commented Nov 8, 2019

ZihengJiang commented Nov 12, 2019

ZihengJiang commented Nov 14, 2019

apivovarov commented Dec 4, 2019 • edited Loading

kimishpatel commented Dec 4, 2019

umangyadav commented Nov 1, 2019 •

edited

Loading

umangyadav Nov 1, 2019 •

edited

Loading

umangyadav commented Nov 1, 2019 •

edited

Loading

apivovarov commented Dec 4, 2019 •

edited

Loading