Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tighten split's extent #4931

Merged

Conversation

yongfeng-nv
Copy link
Contributor

@yongfeng-nv yongfeng-nv commented Feb 24, 2020

PassDownDomain always sets inner IterVar's extent to the split factor, even if the parent extent is less than the factor (usually happens when the parent has been reduced to a single point). Although predicates are added to guard bounds, unnecessary likely statements are inserted. They are road blocks for tensorization/intrinsic.

Here is an example from the new test:
Before:

// attr [EF] storage_scope = "global"
allocate EF[float32 * 8]
produce E {
  produce EF {
    for (i, 0, 8) {
      EF[i] = (I[i]*2f)
    }
  }
  for (i.inner, 0, 32) {
    if (likely((i.inner < 8))) {
      if (likely((i.inner < 8))) {
        EF[i.inner] = (EF[i.inner]*2f)
      }
    }
  }
}

After:

// attr [EF] storage_scope = "global"
allocate EF[float32 * 8]
produce E {
  produce EF {
    for (i, 0, 8) {
      EF[i] = (I[i]*2f)
    }
  }
  for (i.inner, 0, 8) {
    EF[i.inner] = (EF[i.inner]*2f)
  }
}

Solution:
Tighten iv's extent to min(parent_extent, factor_or_nparts), only if all of the
following three conditions are met:

  1. No leaf IterVar is derived from iv binds to any thread. People may use split
    to force an IterVar extent to match the number of allocated threads to fuse stages
    that require different number of threads. We don't want to change these extents.
  2. allow_missing is false, i.e. that PassDownDomain is called by the final InferBound,
    rather than by an early compiler phase, such as rfactor(). We don't want to tighten an
    IterVar in an early phase allowing missing IterVars, because it may bind to a thread later.
  3. range_parent's extent is not 0. At lest one Topi test has a case where a tensor has one
    zero-sized dimension. Split creates iv with a positive extent to avoid zero-extent
    IterVar. We don't touch it.

An earlier attempt to this issue is in this PR (#4885). This PR also reports it (#4932).

Follow-up to the discussion in #4885:

  1. Comparing with Split node min range is not stringent. #4885, this one doesn't have to change existing tests/schedules and keeps allowing the following use case:
for (int i = 0; i < 31; ++i) {
   B[i] = C[i] + 1
}
for (int j = 0; j < 32; ++j) {
   C[i] = A[i] * 2
}
  1. The running time for tests/python/unittest/test_schedule_tensor_core.py:test_tensor_core_batch_conv reduces further (Original: 0.060 ms, Split node min range is not stringent. #4885: 0.035 ms, this one: 0.026ms).

@yongfeng-nv yongfeng-nv force-pushed the split-node-min-range-only-at-final-pass branch from 19f6d96 to 15b4e8f Compare February 24, 2020 22:02
@merrymercy merrymercy self-assigned this Feb 24, 2020
@yongfeng-nv yongfeng-nv changed the title WIP: Alternative solution to Split node min range is not stringent (4885). Don't Merge. Alternative solution to tight split's extent Feb 25, 2020
@yongfeng-nv yongfeng-nv changed the title Alternative solution to tight split's extent Tighten split's extent Feb 25, 2020
@yongfeng-nv
Copy link
Contributor Author

@merrymercy, I have updated this PR description and refactor the code. All tests are clear. Please review it. Thank you.

Copy link
Member

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use const reference when possible.
The logic looks good to me.
cc @tqchen for double check.

src/te/schedule/message_passing.cc Outdated Show resolved Hide resolved
src/te/schedule/message_passing.cc Outdated Show resolved Hide resolved
src/te/schedule/message_passing.cc Outdated Show resolved Hide resolved
src/te/schedule/message_passing.cc Outdated Show resolved Hide resolved
@yongfeng-nv
Copy link
Contributor Author

Use const reference when possible.
The logic looks good to me.
cc @tqchen for double check.

Thank you for the careful review. I forgot to check references after refactoring.

@yongfeng-nv yongfeng-nv force-pushed the split-node-min-range-only-at-final-pass branch from 13747ff to 79c5ecf Compare February 28, 2020 16:54
@yongfeng-nv
Copy link
Contributor Author

I don't understand the current failure: https://ci.tvm.ai/blue/organizations/jenkins/tvm/detail/PR-4931/7/pipeline. How do I handle it?

@comaniac
Copy link
Contributor

This CI failure has been fixed several hours ago. Rebasing this branch to upstream and the errors should be gone.

@yongfeng-nv yongfeng-nv force-pushed the split-node-min-range-only-at-final-pass branch from 79c5ecf to 90d7381 Compare February 29, 2020 03:22
…arts, but only when PassDownDomain is called with allow_missing == false, i.e. by InferBound. Add a helper PassUpThreadBinding() to get a map telling whether an IterVar has at least one leaf IterVar deriving from it binding to a thread. Add two unit tests.
…from testtopi/tests/python/test_topi_transform.py::test_tile.
@yongfeng-nv
Copy link
Contributor Author

yongfeng-nv commented Mar 4, 2020

I need some help to reproduce the failure locally. I tried the CI twice and both ended with the same failure (e.g. https://ci.tvm.ai/blue/organizations/jenkins/tvm/detail/PR-4931/9/pipeline/). But neither I or fellow engineer was able to reproduce the failure on our local machines. Any suggestion?

I don't mind adding a few temporary debugging dumping code to my change and the test, but the test is the very last one in the entire CI and takes 2-3 hours to reach. Is there any way to get me quickly to this test in CI?

@tqchen

@tqchen
Copy link
Member

tqchen commented Mar 4, 2020

The error appears to be a flaky test error that may not have things to do with the PR, please try to send a dummy commit to retrigger

@yongfeng-nv yongfeng-nv force-pushed the split-node-min-range-only-at-final-pass branch from f97f5d7 to 40ecacc Compare March 4, 2020 15:22
@yongfeng-nv yongfeng-nv requested a review from merrymercy March 4, 2020 18:46
@merrymercy merrymercy merged commit 585f9ce into apache:master Mar 4, 2020
@yongfeng-nv
Copy link
Contributor Author

@merrymercy, I have made your recommended changes. Please review it again. Can you share your use case hitting this issue?

@tqchen, the flaky test finally passed. Thank you for the help. Please double check the logic of this PR and share your thoughts.

@tqchen
Copy link
Member

tqchen commented Mar 9, 2020

This PR is among one of the PRs affected by the github squash commit bug. We take every contribution serious in the TVM community. The community has decided to use revert/redo approach to amend the contributions as per #5015

@yongfeng-nv Please let us know if you would like us to revert the PR and resend the contribution. Thank you. cc @merrymercy

@yongfeng-nv
Copy link
Contributor Author

This PR is among one of the PRs affected by the github squash commit bug. We take every contribution serious in the TVM community. The community has decided to use revert/redo approach to amend the contributions as per #5015

@yongfeng-nv Please let us know if you would like us to revert the PR and resend the contribution. Thank you. cc @merrymercy

@tqchen Sure. Thanks.

@tqchen
Copy link
Member

tqchen commented Mar 10, 2020

@merrymercy can you followup by reverting this PR first? Then @yongfeng-nv can send another PR back

merrymercy added a commit that referenced this pull request Mar 10, 2020
tqchen pushed a commit that referenced this pull request Mar 11, 2020
@tqchen
Copy link
Member

tqchen commented Mar 11, 2020

@yongfeng-nv the revert PR has been merged. Please send another PR back :)

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 16, 2020
* Set split node's range to minimum of ext and split factor or split nparts, but only when PassDownDomain is called with allow_missing == false, i.e. by InferBound.  Add a helper PassUpThreadBinding() to get a map telling whether an IterVar has at least one leaf IterVar deriving from it binding to a thread. Add two unit tests.

* Enhance LoopVectorizer for vectorizing by 0.  Found at least one case from testtopi/tests/python/test_topi_transform.py::test_tile.

* Revert changes vectorize_loop.cc; when parent's ext is zero, set split's range to the factor or nparts.

* Update with comments.

* Refactor the ext tightening predicate.

* Fix reference types.

* Integrate tvm.te changes.

* Trivial comment change to trigger CI.

* Trivial comment correction to trigger testing.
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 16, 2020
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Apr 17, 2020
* Set split node's range to minimum of ext and split factor or split nparts, but only when PassDownDomain is called with allow_missing == false, i.e. by InferBound.  Add a helper PassUpThreadBinding() to get a map telling whether an IterVar has at least one leaf IterVar deriving from it binding to a thread. Add two unit tests.

* Enhance LoopVectorizer for vectorizing by 0.  Found at least one case from testtopi/tests/python/test_topi_transform.py::test_tile.

* Revert changes vectorize_loop.cc; when parent's ext is zero, set split's range to the factor or nparts.

* Update with comments.

* Refactor the ext tightening predicate.

* Fix reference types.

* Integrate tvm.te changes.

* Trivial comment change to trigger CI.

* Trivial comment correction to trigger testing.
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants