-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QNN] Enable constant folding for QNN operations. #11228
Conversation
"""Fold the constant expressions in a Relay program. | ||
Parameters | ||
---------- | ||
expr: Expr | ||
The expression to fold | ||
mod: IRModule | ||
The module the expr lives in (for global calls) | ||
fskip: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument name here doesn't match with the one in the function. Please review the other ones as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, thank you! I will fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 2 places
@ibsidorenko There is an unrelated lint error, can you rebase and push again? |
This commit enables constant folding for QNN operations. This functionalty is disabled by default, use fold_qnn=True to enable. Co-authored-by: Alexander Peskov <[email protected]>
182cad4
to
66ddb3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviving this. The changes requested as part of this review, is to add more test cases to cover more qnn.ops as we are claiming to support the generic fold_qnn here.
Additionally to that,
Im a bit worried about about compounding Legalize and ConstFolding together and calling the union ConstFolding.
As we discussed in the other PR, I feel its clearer to explicitly run Legalize before doing the ConstFolding. This is because I'd expect the IR post transformation of ConstFolding to just folded down rather than being introduced with new ops (that was legalized to). Do you have a good reason why we need to compound this behavior ?
@manupa-arm I have added more simple unit tests for other QNN ops (quantize/requantize/conv2d/add/mul). |
As in short that's because of BYOC. @masahi answers that quite correctly in previous discussion Will try to explain a little bit more detailed. In my particular case I have to know the tensor is constant or not before applying "partition_for_xxx" pass. Imagine that you have device which is able to process conv2 primitive only when weights are constants. Term "constant" in that case means that weight data are available on device initialisation step and device is able to apply some HW specific transformations and copy in proper HW specific memory. Moreover, we do not know type of weight transformation during tvm compilation because it depends on particular type of HW and device state. So we have to partition graph with taking into account this requirements. Patterns may look like next:
The pattern 'pat_2' is not suitable for our case because it will treat second argument as regular var regardless of whether it's constant or not. Weight tensor will be passed to BYOC function as regular argument of method Run(), but not for Init(). So we would like to use 'pat_1'. To support 'pat_1' we have to fold all constant subgraphs (like a 'qnn.quantize(const_weight_fp32)') to real constants before applying partitioner pass, otherwise the pattern will be skipped. Applying legalization pass before constant folding will decompose 'qnn.conv2d' as well and pattern 'pat_1' will not be matched anyway. Totally, using legalization + constant_folding The shortest way I found is to conditionally decompose qnn primitives only for constant subgraphs. That is equivalent of adding qnn primitives into constant folding pass. And I think it's right direction. One of alternative way is to introduce one more pattern helper like |
Thanks @ibsidorenko for tests. Thanks @apeskov for the detailed explaination. IIUC, the requirement is to find Expr s that would be folded down to a relay.Constant (going through qnn folding in the process) and keep the rest of qnn ops non-legalized. I recognize this feature to be important and happy to see this feature being completed. Sorry, I did not fully understand before what you were trying to achieve here. However, looking at the test cases, I could not recognize the above mentioned feature is achieved because all the test cases started with const arguments for qnn ops. Therefore, left me wondering why we cant achieve running a sequential of Legalize, FoldConstant. Since this change is being done to IRModule --> IRModule pass, I think it would be good to have a test case where we could observe that such constant subgraphs are folded while rest of the qnn ops (that does not have all arguments as constants) are not legalized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up our discussion, another round of review for the PR.
@leandron @manupa-arm @masahi could someone approve workflow to run? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ibsidorenko @apeskov! It looks great now!
Thanks everyone for detailed review! Glad to see 2 approvals for this PR. @masahi @manupa-arm @jwfromm Could someone press merge button while there is no merge conflict with main? |
* [QNN] Enable constant folding for QNN operations. This commit enables constant folding for QNN operations. This functionalty is disabled by default, use fold_qnn=True to enable. Co-authored-by: Alexander Peskov <[email protected]> * [NFC] Fixed comments * Added more unit tests for QNN opers in constant folding pass. * Address PR feedbacks Co-authored-by: Alexander Peskov <[email protected]>
* [QNN] Enable constant folding for QNN operations. This commit enables constant folding for QNN operations. This functionalty is disabled by default, use fold_qnn=True to enable. Co-authored-by: Alexander Peskov <[email protected]> * [NFC] Fixed comments * Added more unit tests for QNN opers in constant folding pass. * Address PR feedbacks Co-authored-by: Alexander Peskov <[email protected]>
* [QNN] Enable constant folding for QNN operations. This commit enables constant folding for QNN operations. This functionalty is disabled by default, use fold_qnn=True to enable. Co-authored-by: Alexander Peskov <[email protected]> * [NFC] Fixed comments * Added more unit tests for QNN opers in constant folding pass. * Address PR feedbacks Co-authored-by: Alexander Peskov <[email protected]>
This PR is an attempt to revive PR#9164 . It enables folding of constants for QNN operations. Motivation to have this feature is BYOC use cases. For some BYOC it can help to avoid weights converting at runtime and thus to improve performance.
One important thing: for the case when we call
FoldConstant
beforeFakeQuantizationToInteger
pass, we can preventFQ2I
from converting some ops to qnn equivalent. To avoid this, folding of QNN constants is disabled by default. To enable use fold_qnn=True flag inFoldConstant
pass.Co-authored-by: Alexander Peskov [email protected]