New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[QNN] Lowering for Depthwise Convolution. #4351

Merged

zhiics merged 1 commit into apache:master from anijain2305:qnn_dwc

Nov 21, 2019

Contributor

anijain2305 commented Nov 15, 2019 •

edited

Loading

Adding the QNN lowering sequence for Depthwise conv2d. This creates depthwise conv2d with (u)int8 inputs, opening up path to write schedule using Intel VNNI and ARM DOT instructions.

For older HW that do not have fast in8 support, this lowering will not be called (already merged upstream in a different PR).

@FrozenGene @jackwish @tmoreau89 @yzhliu @zhiics

anijain2305 force-pushed the qnn_dwc branch 2 times, most recently from 9d45f03 to 4e2ec58 Compare

November 19, 2019 05:49

Contributor Author

anijain2305 commented Nov 19, 2019

@FrozenGene @jackwish @tmoreau89 A gentle ping for review :) This PR is ready for review now.

anijain2305 force-pushed the qnn_dwc branch from 4e2ec58 to 65202dc Compare

November 19, 2019 07:12

zhenhuaw-me reviewed

View reviewed changes

Contributor

zhenhuaw-me left a comment

Minor comments :)

src/relay/qnn/op/convolution.cc Outdated

                 if (param->kernel_layout == "OIHW") {
                   out_channels = get_const_int(kernel_shape[0]);
                   kernel_h = get_const_int(kernel_shape[2]);
                   kernel_w = get_const_int(kernel_shape[3]);
+                  if (is_depthwise(param)) {

Contributor

zhenhuaw-me Nov 19, 2019

What about using a variable to hold this?

src/relay/qnn/op/convolution.cc

+               *       For depthwise, we can similarly unroll the computation. The intial compute is as follows
+               *       wehere cm = channel_multiplier
+               *
+               *       Qc(n, oc, oh, ow) = Sigma(r, s) (Qw(oc/m, oc%/m, r, s) - zp_w)

Contributor

zhenhuaw-me Nov 19, 2019

Suggesting to rewrite related formulas with tex: block example, or inline style.

Contributor Author

anijain2305 Nov 20, 2019

Thanks for the comments. I will send a separate PR for better comments. This will have to change in multiple files I think.

src/relay/qnn/op/convolution.cc

                 CHECK_EQ(param->dilation.size(), 2) << "qnn.conv2d only supports 2D dilation";
                 auto dilation_h = get_const_int(param->dilation[0]);
                 auto dilation_w = get_const_int(param->dilation[1]);
-                if (dilation_h != 1 || dilation_w != 1 || param->groups != 1) {
+                if (dilation_h != 1 || dilation_w != 1 || (param->groups != 1 && !is_depthwise(param))) {

Contributor

zhenhuaw-me Nov 19, 2019

Seems is_depthwise already checked groups?

Contributor Author

anijain2305 Nov 19, 2019

This is to differentiate between grouped convolutions (old Alexnet) vs depthwise convolution. We want to fallback to simpler lowering if its grouped convolution, but not depthwise.

Contributor

zhenhuaw-me Nov 20, 2019

Sorry for this, I remembered that I have removed this comment after I took a clear look at this :(

src/relay/qnn/op/convolution.cc

Comment on lines -435 to +594

-                auto term3 = Conv2DThirdTerm(weight, param, batch_size, out_channels);
-                auto term4 = Conv2DFourthTerm(param, batch_size, in_channels, kernel_h, kernel_w);
+                auto term3 = Conv2DThirdTerm(weight, param, out_channels);
+                auto term4 = Conv2DFourthTerm(param, in_channels, kernel_h, kernel_w);

Contributor

zhenhuaw-me Nov 19, 2019

That is interesting, why removing batch semantic?

Contributor Author

anijain2305 Nov 19, 2019 •

edited

Loading

The batch_size was not needed at all for those 2 terms in the first place. I realized that while writing the lowering for Depthwise.

src/relay/qnn/op/convolution.cc Outdated

+                // We can reduce the H and W axis by using avg_pool2d. However, avg_pool2d averages the sum.
+                // Since, this is integer division (floor), we can first multiply the data by the pool_size and
+                // then perform avg_pool2d. Reversing this causes inaccuracy due to floor division.
+                auto scaled_hw_t2 = Multiply(casted_t2, MakeConstantScalar(Int(32), kernel_h * kernel_w));

Contributor

zhenhuaw-me Nov 19, 2019

If this is unneeded (like L215) when kernel_h * kernel_w == 1?

Contributor Author

anijain2305 Nov 20, 2019

Nice observation! Yes, that will save a few us :) I will make changes for the conv2d lowering as well.


          [QNN] Lowering for Depthwise Convolution.

7d6cb01

anijain2305 force-pushed the qnn_dwc branch from 65202dc to 7d6cb01 Compare

November 20, 2019 00:40

Contributor Author

anijain2305 commented Nov 20, 2019

@jackwish Thanks! Addressed your comments :)

zhenhuaw-me approved these changes

View reviewed changes

Contributor

zhenhuaw-me left a comment

LGTM. Thank you @anijain2305

Contributor Author

anijain2305 commented Nov 20, 2019

@zhiics Can you please review?

zhiics approved these changes

View reviewed changes

Member

zhiics left a comment

LGTM

zhiics merged commit 464ebb1 into apache:master

Member

zhiics commented Nov 21, 2019

Thanks @anijain2305 @jackwish

zhiics added the status: accepted label

tqchen mentioned this pull request

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

zxy844288792 pushed a commit to zxy844288792/tvm that referenced this pull request


          [QNN] Lowering for Depthwise Convolution. (apache#4351)

a127040

yongwww pushed a commit to neo-ai/tvm that referenced this pull request


          [QNN] Lowering for Depthwise Convolution. (apache#4351)

030c413

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: accepted