Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade AutoTensorCore as to a TIR Pass #5741

Closed
tqchen opened this issue Jun 6, 2020 · 6 comments
Closed

Upgrade AutoTensorCore as to a TIR Pass #5741

tqchen opened this issue Jun 6, 2020 · 6 comments

Comments

@tqchen
Copy link
Member

tqchen commented Jun 6, 2020

AutoTensorCore is a pattern detection util that detects the matrix multiplication pattern, and rewrites the compute to make use of the tensorcore intrinsics. #4234

However, because part of the pattern analysis depends on the tensor expression information and couples with the analysis, it does not qualify as a pass.

Under the unified IR, a transformation pass should to take in information from a PrimFunc in the IRModule, and output another information as a PrimFunc. A pass should not take additional information from the high level DSL stages. This being said, we could apply transformations in the high level to add decorations(e.g. pragma) to a loop to achieve the same goal.

Due to the current restriction, the AutoTensorCore rewrite has been temporarily moved as a special post processor in
https://github.com/apache/incubator-tvm/blob/master/src/te/schedule/schedule_postproc_rewrite_for_tensor_core.cc

However, this rewriting should really qualifies as a pass. As a part of unified IR effort, we want to reduce "non-pass" transformations to a minimum set(only lowering from te to TIR).

This is an issue to track the issue and discuss potential solutions. There are two potential ways to migrate the pass.

  • E0: Directly migrate the matmul pattern detector to search over the loop nest, instead of the te stage.
  • E1: If analysis on the te stage is necessary, run a light weight transformation on te to tag the tensor core related information.

Ideally E0 is preferred. Notably, @Hzfengsy is also working on related changes to TIR to make direct pattern detection in the TIR easier.

@tqchen
Copy link
Member Author

tqchen commented Jun 6, 2020

@tqchen tqchen changed the title Upgrade AutoTensorCore as a Pass Upgrade AutoTensorCore as to a TIR Pass Jun 6, 2020
@were
Copy link
Contributor

were commented Jun 6, 2020

#5498 Kind of related to this one.
The current TensorCore code generation is tricky --- since we have no fundamental wrap reduction support in TVM, generating TensorCore code inevitably messes up the thread bind. No threadIdxwas bound to a reduce axis before, but we inevitably need a threadIdx for that reduce axis.

Is it possible to somehow represent the warp wise reduce in the schedule, and then the TIR analyzer and rewriter can detect this opportunity of matching TensorCore.

@jcf94
Copy link
Contributor

jcf94 commented Jun 7, 2020

cc @minminsun @Hzfengsy @merrymercy @jcf94, @yangjunpro

Thanks! We've also noticed the problem that current implementation of AutoTensorCore is not pretty enough.
We're working on enabling TVM to auto generate schedule in the project Ansor with @merrymercy now, and we also had some discussions with @Hzfengsy during the development.
Auto TensorCore codegen support is an important feature for us, we'll continue to work on it and try to figure out a better way.

@tqchen
Copy link
Member Author

tqchen commented Jun 7, 2020

Thanks @jcf94 . I agree that making a better approach for tensorization is important and we should continue to push that direction.

My specific issue is about what to do with the pass as in its current state now. Specifically, there are a few options:

  • A0: Temporarily remove AutoTensorCore, assuming it has limited usecase and most of the current TensorCore goes through the tensorcore intrinsics, add a better solution back later.
  • A1: Migrate the AutoTensorCore to TIR by pattern matching the loop nest(instead of the compute expression), so that it can becomes part of the TIR pass, replace the pass later once we have a better solution.
  • A2: Keep AutoTensorCore in its current location, which introduce a friction pt to the overall design itself, maintaining it as we refactor the code base while tolerating the design friction, remove it once we have a better solution.

We can find that in all of these cases, the AutoTensorCore pass as it is will get removed eventually once we find a better solution. They have different pros and cons, for example, A2 brings a design friction pt to the overall architecture, and could cause problem if we want to release before we find a better solution. A0 is the easiest for the developers, but also means the feature will be un-available until we find a better solution. A1 maintains the codebase itself, while continues to migrating it to a better state that fits into the current design, of course it puts more demands on the developers themselves.

This is an interesting case, the code itself becomes a technical debt that we need to pay as maintainers. It is fun to develop new features. In the meanwhile, maintaining existing code, revisit the design and keep keep migrating them to a better infrastructure is equally important, if not more important for a healthy project. As all new features eventually become technical debts when other new features are added on top. It is important for us to keep infrastructure innovation and refactoring to reduce the amount of key concepts back to minimum. So that we can more effectively evolve to deliver great new features.

Would love to see thoughts wrt to the three options.

@minminsun
Copy link
Contributor

Thanks @tqchen.

It is fun to develop new features. In the meanwhile, maintaining existing code, revisit the design and keep keep migrating them to a better infrastructure is equally important

Can't agree more!
Just as @jcf94 said this pass is required for Ansor to generate code for TensorCore, so we prefer not to remove it for now. We will try to figure out the possiblilty of matmul pattern matching of on TIR instead of TE.

@tqchen tqchen closed this as completed Nov 1, 2020
@tqchen
Copy link
Member Author

tqchen commented Nov 1, 2020

close for now due to inactive status

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants