[QNN] Enable constant folding for QNN operations. #11228

ibsidorenko · 2022-05-06T09:18:50Z

This PR is an attempt to revive PR#9164 . It enables folding of constants for QNN operations. Motivation to have this feature is BYOC use cases. For some BYOC it can help to avoid weights converting at runtime and thus to improve performance.

One important thing: for the case when we call FoldConstant before FakeQuantizationToInteger pass, we can prevent FQ2I from converting some ops to qnn equivalent. To avoid this, folding of QNN constants is disabled by default. To enable use fold_qnn=True flag in FoldConstant pass.

Co-authored-by: Alexander Peskov [email protected]

leandron · 2022-05-06T12:08:53Z

python/tvm/relay/transform/transform.py

    """Fold the constant expressions in a Relay program.
    Parameters
    ----------
    expr: Expr
        The expression to fold
    mod: IRModule
        The module the expr lives in (for global calls)
+    fskip: bool


The argument name here doesn't match with the one in the function. Please review the other ones as well.

You are right, thank you! I will fix it.

Fixed in 2 places

ibsidorenko · 2022-05-06T13:35:32Z

cc: @manupa-arm @masahi @apeskov

masahi · 2022-05-09T19:59:02Z

@ibsidorenko There is an unrelated lint error, can you rebase and push again?

This commit enables constant folding for QNN operations. This functionalty is disabled by default, use fold_qnn=True to enable. Co-authored-by: Alexander Peskov <[email protected]>

include/tvm/relay/transform.h

manupak

Thanks for reviving this. The changes requested as part of this review, is to add more test cases to cover more qnn.ops as we are claiming to support the generic fold_qnn here.

Additionally to that,

Im a bit worried about about compounding Legalize and ConstFolding together and calling the union ConstFolding.

As we discussed in the other PR, I feel its clearer to explicitly run Legalize before doing the ConstFolding. This is because I'd expect the IR post transformation of ConstFolding to just folded down rather than being introduced with new ops (that was legalized to). Do you have a good reason why we need to compound this behavior ?

tests/python/relay/test_pass_fold_constant.py

ibsidorenko · 2022-05-11T17:23:55Z

@manupa-arm I have added more simple unit tests for other QNN ops (quantize/requantize/conv2d/add/mul).
I will address your question about Legalize/ConstFolding a little bit later...

apeskov · 2022-05-12T12:55:47Z

@manupa-arm

Do you have a good reason why we need to compound this behaviour?

As in short that's because of BYOC. @masahi answers that quite correctly in previous discussion

Will try to explain a little bit more detailed.

In my particular case I have to know the tensor is constant or not before applying "partition_for_xxx" pass. Imagine that you have device which is able to process conv2 primitive only when weights are constants. Term "constant" in that case means that weight data are available on device initialisation step and device is able to apply some HW specific transformations and copy in proper HW specific memory. Moreover, we do not know type of weight transformation during tvm compilation because it depends on particular type of HW and device state.

So we have to partition graph with taking into account this requirements. Patterns may look like next:

pat_1 = is_op("qnn.conv2d")(wildcard(), is_constant())  # Good. Strong requirements of constants
pat_2 = is_op("qnn.conv2d")(wildcard(), wildcard())     # Bad. No restrictions. Will match anywhere, with and without const

The pattern 'pat_2' is not suitable for our case because it will treat second argument as regular var regardless of whether it's constant or not. Weight tensor will be passed to BYOC function as regular argument of method Run(), but not for Init(). So we would like to use 'pat_1'.

To support 'pat_1' we have to fold all constant subgraphs (like a 'qnn.quantize(const_weight_fp32)') to real constants before applying partitioner pass, otherwise the pattern will be skipped. Applying legalization pass before constant folding will decompose 'qnn.conv2d' as well and pattern 'pat_1' will not be matched anyway. Totally, using legalization + constant_folding
before partitioning doesn't help.

The shortest way I found is to conditionally decompose qnn primitives only for constant subgraphs. That is equivalent of adding qnn primitives into constant folding pass. And I think it's right direction.

One of alternative way is to introduce one more pattern helper like is_constant_subgraph() and implement lazy initialisation on BYOC side. But it looks slightly unnatural.

manupak · 2022-05-12T13:23:32Z

Thanks @ibsidorenko for tests.

Thanks @apeskov for the detailed explaination.

IIUC, the requirement is to find Expr s that would be folded down to a relay.Constant (going through qnn folding in the process) and keep the rest of qnn ops non-legalized. I recognize this feature to be important and happy to see this feature being completed. Sorry, I did not fully understand before what you were trying to achieve here.

However, looking at the test cases, I could not recognize the above mentioned feature is achieved because all the test cases started with const arguments for qnn ops. Therefore, left me wondering why we cant achieve running a sequential of Legalize, FoldConstant.

Since this change is being done to IRModule --> IRModule pass, I think it would be good to have a test case where we could observe that such constant subgraphs are folded while rest of the qnn ops (that does not have all arguments as constants) are not legalized.

manupak

Following up our discussion, another round of review for the PR.

include/tvm/relay/transform.h

python/tvm/relay/transform/transform.py

tests/python/relay/test_pass_fold_constant.py

apeskov · 2022-05-13T14:46:24Z

@leandron @manupa-arm @masahi could someone approve workflow to run?

manupak

Thanks @ibsidorenko @apeskov! It looks great now!

apeskov · 2022-05-13T18:48:58Z

Thanks everyone for detailed review! Glad to see 2 approvals for this PR.

@masahi @manupa-arm @jwfromm Could someone press merge button while there is no merge conflict with main?

* [QNN] Enable constant folding for QNN operations. This commit enables constant folding for QNN operations. This functionalty is disabled by default, use fold_qnn=True to enable. Co-authored-by: Alexander Peskov <[email protected]> * [NFC] Fixed comments * Added more unit tests for QNN opers in constant folding pass. * Address PR feedbacks Co-authored-by: Alexander Peskov <[email protected]>

leandron reviewed May 6, 2022

View reviewed changes

ibsidorenko and others added 2 commits May 10, 2022 10:20

[QNN] Enable constant folding for QNN operations.

582f90c

This commit enables constant folding for QNN operations. This functionalty is disabled by default, use fold_qnn=True to enable. Co-authored-by: Alexander Peskov <[email protected]>

[NFC] Fixed comments

66ddb3e

ibsidorenko force-pushed the fold-qnn-const-v2 branch from 182cad4 to 66ddb3e Compare May 10, 2022 07:23

masahi reviewed May 10, 2022

View reviewed changes

include/tvm/relay/transform.h Show resolved Hide resolved

masahi approved these changes May 10, 2022

View reviewed changes

manupak requested changes May 10, 2022

View reviewed changes

tests/python/relay/test_pass_fold_constant.py Show resolved Hide resolved

Added more unit tests for QNN opers in constant folding pass.

b455d84

manupak requested changes May 12, 2022

View reviewed changes

include/tvm/relay/transform.h Show resolved Hide resolved

python/tvm/relay/transform/transform.py Show resolved Hide resolved

tests/python/relay/test_pass_fold_constant.py Show resolved Hide resolved

Address PR feedbacks

e3d9f6d

manupak approved these changes May 13, 2022

View reviewed changes

masahi merged commit d871bbd into apache:main May 13, 2022

ibsidorenko deleted the fold-qnn-const-v2 branch May 18, 2022 07:04

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QNN] Enable constant folding for QNN operations. #11228

[QNN] Enable constant folding for QNN operations. #11228

ibsidorenko commented May 6, 2022

leandron May 6, 2022

ibsidorenko May 6, 2022 •

edited

Loading

ibsidorenko May 9, 2022

ibsidorenko commented May 6, 2022

masahi commented May 9, 2022

manupak left a comment •

edited

Loading

ibsidorenko commented May 11, 2022

apeskov commented May 12, 2022 •

edited

Loading

manupak commented May 12, 2022

manupak left a comment

apeskov commented May 13, 2022

manupak left a comment

apeskov commented May 13, 2022

[QNN] Enable constant folding for QNN operations. #11228

[QNN] Enable constant folding for QNN operations. #11228

Conversation

ibsidorenko commented May 6, 2022

leandron May 6, 2022

Choose a reason for hiding this comment

ibsidorenko May 6, 2022 • edited Loading

Choose a reason for hiding this comment

ibsidorenko May 9, 2022

Choose a reason for hiding this comment

ibsidorenko commented May 6, 2022

masahi commented May 9, 2022

manupak left a comment • edited Loading

Choose a reason for hiding this comment

ibsidorenko commented May 11, 2022

apeskov commented May 12, 2022 • edited Loading

manupak commented May 12, 2022

manupak left a comment

Choose a reason for hiding this comment

apeskov commented May 13, 2022

manupak left a comment

Choose a reason for hiding this comment

apeskov commented May 13, 2022

ibsidorenko May 6, 2022 •

edited

Loading

manupak left a comment •

edited

Loading

apeskov commented May 12, 2022 •

edited

Loading