Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QNN] Enable constant folding for QNN operations. #11228

Merged
merged 4 commits into from
May 13, 2022

Conversation

ibsidorenko
Copy link
Contributor

This PR is an attempt to revive PR#9164 . It enables folding of constants for QNN operations. Motivation to have this feature is BYOC use cases. For some BYOC it can help to avoid weights converting at runtime and thus to improve performance.

One important thing: for the case when we call FoldConstant before FakeQuantizationToInteger pass, we can prevent FQ2I from converting some ops to qnn equivalent. To avoid this, folding of QNN constants is disabled by default. To enable use fold_qnn=True flag in FoldConstant pass.

Co-authored-by: Alexander Peskov [email protected]

"""Fold the constant expressions in a Relay program.
Parameters
----------
expr: Expr
The expression to fold
mod: IRModule
The module the expr lives in (for global calls)
fskip: bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument name here doesn't match with the one in the function. Please review the other ones as well.

Copy link
Contributor Author

@ibsidorenko ibsidorenko May 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, thank you! I will fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2 places

@ibsidorenko
Copy link
Contributor Author

cc: @manupa-arm @masahi @apeskov

@masahi
Copy link
Member

masahi commented May 9, 2022

@ibsidorenko There is an unrelated lint error, can you rebase and push again?

ibsidorenko and others added 2 commits May 10, 2022 10:20
This commit enables constant folding for QNN operations.
This functionalty is disabled by default, use fold_qnn=True to enable.

Co-authored-by: Alexander Peskov <[email protected]>
Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviving this. The changes requested as part of this review, is to add more test cases to cover more qnn.ops as we are claiming to support the generic fold_qnn here.

Additionally to that,

Im a bit worried about about compounding Legalize and ConstFolding together and calling the union ConstFolding.

As we discussed in the other PR, I feel its clearer to explicitly run Legalize before doing the ConstFolding. This is because I'd expect the IR post transformation of ConstFolding to just folded down rather than being introduced with new ops (that was legalized to). Do you have a good reason why we need to compound this behavior ?

tests/python/relay/test_pass_fold_constant.py Show resolved Hide resolved
@ibsidorenko
Copy link
Contributor Author

@manupa-arm I have added more simple unit tests for other QNN ops (quantize/requantize/conv2d/add/mul).
I will address your question about Legalize/ConstFolding a little bit later...

@apeskov
Copy link
Contributor

apeskov commented May 12, 2022

@manupa-arm

Do you have a good reason why we need to compound this behaviour?

As in short that's because of BYOC. @masahi answers that quite correctly in previous discussion

Will try to explain a little bit more detailed.

In my particular case I have to know the tensor is constant or not before applying "partition_for_xxx" pass. Imagine that you have device which is able to process conv2 primitive only when weights are constants. Term "constant" in that case means that weight data are available on device initialisation step and device is able to apply some HW specific transformations and copy in proper HW specific memory. Moreover, we do not know type of weight transformation during tvm compilation because it depends on particular type of HW and device state.

So we have to partition graph with taking into account this requirements. Patterns may look like next:

pat_1 = is_op("qnn.conv2d")(wildcard(), is_constant())  # Good. Strong requirements of constants
pat_2 = is_op("qnn.conv2d")(wildcard(), wildcard())     # Bad. No restrictions. Will match anywhere, with and without const

The pattern 'pat_2' is not suitable for our case because it will treat second argument as regular var regardless of whether it's constant or not. Weight tensor will be passed to BYOC function as regular argument of method Run(), but not for Init(). So we would like to use 'pat_1'.

To support 'pat_1' we have to fold all constant subgraphs (like a 'qnn.quantize(const_weight_fp32)') to real constants before applying partitioner pass, otherwise the pattern will be skipped. Applying legalization pass before constant folding will decompose 'qnn.conv2d' as well and pattern 'pat_1' will not be matched anyway. Totally, using legalization + constant_folding
before partitioning doesn't help.

The shortest way I found is to conditionally decompose qnn primitives only for constant subgraphs. That is equivalent of adding qnn primitives into constant folding pass. And I think it's right direction.

One of alternative way is to introduce one more pattern helper like is_constant_subgraph() and implement lazy initialisation on BYOC side. But it looks slightly unnatural.

@manupak
Copy link
Contributor

manupak commented May 12, 2022

Thanks @ibsidorenko for tests.

Thanks @apeskov for the detailed explaination.

IIUC, the requirement is to find Expr s that would be folded down to a relay.Constant (going through qnn folding in the process) and keep the rest of qnn ops non-legalized. I recognize this feature to be important and happy to see this feature being completed. Sorry, I did not fully understand before what you were trying to achieve here.

However, looking at the test cases, I could not recognize the above mentioned feature is achieved because all the test cases started with const arguments for qnn ops. Therefore, left me wondering why we cant achieve running a sequential of Legalize, FoldConstant.

Since this change is being done to IRModule --> IRModule pass, I think it would be good to have a test case where we could observe that such constant subgraphs are folded while rest of the qnn ops (that does not have all arguments as constants) are not legalized.

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up our discussion, another round of review for the PR.

include/tvm/relay/transform.h Show resolved Hide resolved
python/tvm/relay/transform/transform.py Show resolved Hide resolved
tests/python/relay/test_pass_fold_constant.py Show resolved Hide resolved
@apeskov
Copy link
Contributor

apeskov commented May 13, 2022

@leandron @manupa-arm @masahi could someone approve workflow to run?

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ibsidorenko @apeskov! It looks great now!

@apeskov
Copy link
Contributor

apeskov commented May 13, 2022

Thanks everyone for detailed review! Glad to see 2 approvals for this PR.

@masahi @manupa-arm @jwfromm Could someone press merge button while there is no merge conflict with main?

@masahi masahi merged commit d871bbd into apache:main May 13, 2022
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request May 16, 2022
* [QNN] Enable constant folding for QNN operations.

This commit enables constant folding for QNN operations.
This functionalty is disabled by default, use fold_qnn=True to enable.

Co-authored-by: Alexander Peskov <[email protected]>

* [NFC] Fixed comments

* Added more unit tests for QNN opers in constant folding pass.

* Address PR feedbacks

Co-authored-by: Alexander Peskov <[email protected]>
shtinsa pushed a commit to Deelvin/tvm that referenced this pull request May 17, 2022
* [QNN] Enable constant folding for QNN operations.

This commit enables constant folding for QNN operations.
This functionalty is disabled by default, use fold_qnn=True to enable.

Co-authored-by: Alexander Peskov <[email protected]>

* [NFC] Fixed comments

* Added more unit tests for QNN opers in constant folding pass.

* Address PR feedbacks

Co-authored-by: Alexander Peskov <[email protected]>
shingjan pushed a commit to shingjan/tvm that referenced this pull request May 17, 2022
* [QNN] Enable constant folding for QNN operations.

This commit enables constant folding for QNN operations.
This functionalty is disabled by default, use fold_qnn=True to enable.

Co-authored-by: Alexander Peskov <[email protected]>

* [NFC] Fixed comments

* Added more unit tests for QNN opers in constant folding pass.

* Address PR feedbacks

Co-authored-by: Alexander Peskov <[email protected]>
@ibsidorenko ibsidorenko deleted the fold-qnn-const-v2 branch May 18, 2022 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants