-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TIR][Transform]Block scope hoisting added #6238
Conversation
3d5b347
to
4fc88c9
Compare
07d8da6
to
3f6a500
Compare
assert(not tvm.ir.structural_equal(new_stmt, stmt)) | ||
|
||
@pytest.mark.skip() | ||
def test_hoisting_op_conv(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: While adding a performance bench-marking test case like this, i am facing a weird behavior in terms of time taken during execution. Sometimes the test case hangs for a very long time, even CI execution halted because of this.
I am debugging it currently. That is why the test case is skipped for the time.
However i have verified the behavior is not due to the Hoisting Pass added. So i think we can continue with the current PR. Thanks!
@@ -93,11 +112,33 @@ using HoistForIfTuple = std::tuple<bool, const ForNode*, const IfThenElseNode*>; | |||
* if (likely(j > 2)) | |||
* A[i+j+k] = B[i+j+k] | |||
* | |||
* | |||
* This pass do hoisting for Block scope variables also. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term "Block scope variables" is a little bit confusing to me. Is it equivalent to say variables defined in Attr
nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It is referring to Attr nodes.
"Block scope variables" is just for my internal mapping :)
python/tvm/driver/build_module.py
Outdated
@@ -181,7 +181,7 @@ def lower(sch, | |||
tvm.tir.transform.BF16Legalize(), | |||
tvm.tir.transform.NarrowDataType(32), | |||
tvm.tir.transform.Simplify(), | |||
tvm.tir.transform.HoistIfThenElse(), | |||
tvm.tir.transform.HoistIfThenElse("basic"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this "basic" pass? Can we directly perform a complete pass in Phase 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can combine both also and place the pass in End phase.
I understand the dilemma here. Even i also had the same :)
The reasoning i came up later is as below.
Current PR supports the feature for "Block scope vars" or "Attr nodes" which happens to be more applicable in specific cases(For example in Cuda Kernels). Also there is slight increase in time complexity(As linear).
So to sum up, we have 2 cases :
Case 1: "Basic" or "Default": The scenarios covered here should be more general(simpler version) across.
Case 2: "Advanced" : The scenarios covered here should be enabled in case of particular settings.
Please let me know your thought on above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a little more compile-time complexity is a real problem. Perhaps the pass can even be faster than those Python bindings. We can always perform the "advanced" version as long as it won't break the correctness or do negative optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right! The compile-time complexity is not the real point here. I just shared the info :)
I am okay for the both the approach.
However i think it would be good to have one additional segregation inside Pass to have more control on the different scenarios it has covered. Which can provide more user friendly experience, when user wants to club the Pass with only specific Passes without needing to write special config parameters.
Let us have some more opinion on this point, to conclude better.
@MarisaKirisame : Would please help share your thoughts on above point. TIA!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use the advanced option. It is more general and the compile time is light juding from the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MarisaKirisame for sharing your thoughts here.
With this i think we can conclude the comment here, by removing the "Basic" pass embedding in the lower sequence of passes, and place only one hoisting pass at the third phase as per latest change.
If my understanding is wrong, please let me know. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is good for me.
@roastduck , @MarisaKirisame : Thanks for reviewing sharing your valuable comments. I have handled most of the comments. Please check. TIA! |
@ANSHUMAN87 please rebase and fix the CI. |
c7645ec
to
33ea539
Compare
@MarisaKirisame : CI is fixed now! Would you please provide your valuable opinion on this section. TIA! |
Gentle ping @MarisaKirisame ! |
@ANSHUMAN87 sorry! I was busy on other stuff. I will review rn. |
@MarisaKirisame , @roastduck : All your comments are handled now. Thanks! |
Thanks @ANSHUMAN87 @roastduck for the work! |
* Block scope hoisting added * lowering flow added with 2 variants * Fake commit to trigger ci with pass default enabled * CI Failure resolved * Optimize for if var list iteration * More test case added * Fake commit to disable failed test cases * Pass default value restored * [1] Review comment handled * [2] Review comments handled
* Block scope hoisting added * lowering flow added with 2 variants * Fake commit to trigger ci with pass default enabled * CI Failure resolved * Optimize for if var list iteration * More test case added * Fake commit to disable failed test cases * Pass default value restored * [1] Review comment handled * [2] Review comments handled
* Block scope hoisting added * lowering flow added with 2 variants * Fake commit to trigger ci with pass default enabled * CI Failure resolved * Optimize for if var list iteration * More test case added * Fake commit to disable failed test cases * Pass default value restored * [1] Review comment handled * [2] Review comments handled
This PR addresses one of the open points from PR(#6066 )
Please refer section
TODO:
Insert the 2 variants of hoisting in proper places in lowering flow
Add comments to explain the scenario
Add test cases with basic, with ops, both cpu & gpu
Share bench-marking results