-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade AutoTensorCore as to a TIR Pass #5741
Comments
#5498 Kind of related to this one. Is it possible to somehow represent the warp wise reduce in the schedule, and then the TIR analyzer and rewriter can detect this opportunity of matching TensorCore. |
Thanks! We've also noticed the problem that current implementation of AutoTensorCore is not pretty enough. |
Thanks @jcf94 . I agree that making a better approach for tensorization is important and we should continue to push that direction. My specific issue is about what to do with the pass as in its current state now. Specifically, there are a few options:
We can find that in all of these cases, the AutoTensorCore pass as it is will get removed eventually once we find a better solution. They have different pros and cons, for example, A2 brings a design friction pt to the overall architecture, and could cause problem if we want to release before we find a better solution. A0 is the easiest for the developers, but also means the feature will be un-available until we find a better solution. A1 maintains the codebase itself, while continues to migrating it to a better state that fits into the current design, of course it puts more demands on the developers themselves. This is an interesting case, the code itself becomes a technical debt that we need to pay as maintainers. It is fun to develop new features. In the meanwhile, maintaining existing code, revisit the design and keep keep migrating them to a better infrastructure is equally important, if not more important for a healthy project. As all new features eventually become technical debts when other new features are added on top. It is important for us to keep infrastructure innovation and refactoring to reduce the amount of key concepts back to minimum. So that we can more effectively evolve to deliver great new features. Would love to see thoughts wrt to the three options. |
Thanks @tqchen.
Can't agree more! |
close for now due to inactive status |
AutoTensorCore is a pattern detection util that detects the matrix multiplication pattern, and rewrites the compute to make use of the tensorcore intrinsics. #4234
However, because part of the pattern analysis depends on the tensor expression information and couples with the analysis, it does not qualify as a pass.
Under the unified IR, a transformation pass should to take in information from a PrimFunc in the IRModule, and output another information as a PrimFunc. A pass should not take additional information from the high level DSL stages. This being said, we could apply transformations in the high level to add decorations(e.g. pragma) to a loop to achieve the same goal.
Due to the current restriction, the AutoTensorCore rewrite has been temporarily moved as a special post processor in
https://github.com/apache/incubator-tvm/blob/master/src/te/schedule/schedule_postproc_rewrite_for_tensor_core.cc
However, this rewriting should really qualifies as a pass. As a part of unified IR effort, we want to reduce "non-pass" transformations to a minimum set(only lowering from te to TIR).
This is an issue to track the issue and discuss potential solutions. There are two potential ways to migrate the pass.
Ideally E0 is preferred. Notably, @Hzfengsy is also working on related changes to TIR to make direct pattern detection in the TIR easier.
The text was updated successfully, but these errors were encountered: