[Speed]Refine elementwise_mul_op gradient functor #8810

chengduoZH · 2018-03-06T16:09:35Z

fix #8811
Refine se_resnet_152 -> elementwise_mul_op gradient-functor
related issue: #8661

The time-consuming of elementwise_mul_grad reduced ten times : 3164.52 -->352.164

profile script

https://github.com/dzhwinter/benchmark/pull/83/files

before optimize

------------------------->     Profiling Report     <-------------------------

Place: All
Time unit: ms
Sorted by total time in descending order in the same thread

Event                            Calls       Total       Min.        Max.        Ave.
thread0::conv2d_grad             8007        16130.5     0.409856    105.679     2.01455
thread0::pool2d_grad             2652        5440.28     0.20384     14.7589     2.05139
thread0::conv2d                  8007        5274.17     0.168032    117.607     0.658695
thread0::elementwise_mul_grad    2550        3164.52     0.352928    4.35414     1.24099
thread0::sum                     5100        1403.87     0.1008      0.902272    0.275268
thread0::batch_norm_grad         8007        1308.53     0.044288    1.41232     0.163423
thread0::batch_norm              8007        1079.35     0.049216    1.16714     0.134801
thread0::elementwise_mul         36873       984.829     0.003072    7.73997     0.0267087
...

after optimize

Event                            Calls       Total       Min.        Max.        Ave.
thread0::conv2d_grad             8007        16069       0.40544     113.233     2.00687
thread0::pool2d_grad             2652        5413.14     0.203456    14.8475     2.04115
thread0::conv2d                  8007        5256.77     0.16912     115.189     0.656521
thread0::sum                     5100        1402.64     0.101024    0.879392    0.275028
thread0::batch_norm_grad         8007        1310.68     0.044832    1.4183      0.163692
thread0::batch_norm              8007        1073.83     0.049376    1.15424     0.134111
thread0::elementwise_mul         36873       991.32      0.003072    0.266528    0.0268847
thread0::momentum                34323       825.496     0.00336     1.07974     0.0240508
thread0::relu_grad               10353       819.45      0.00336     0.772896    0.079151
thread0::relu                    10353       581.015     0.003264    0.56944     0.0561204
thread0::elementwise_mul_grad    2550        352.164     0.061568    0.356768    0.138103
thread0::elementwise_add         7701        311.604     0.003648    0.365344    0.0404628
thread0::elementwise_add_grad    7701        308.575     0.003872    0.337504    0.0400694
...

jacquesqiao

LGTM! Great improve

reyoung · 2018-03-07T03:38:02Z

paddle/fluid/operators/elementwise_mul_op.h

-      dy_e.device(d) = x_e * dz_e;
-    }
-  }
+struct IdentityGrad_DX {


This is not a IdentityGrad functor.
Please use a better name.

It has been renamed here.

reyoung · 2018-03-07T03:38:23Z

paddle/fluid/operators/elementwise_mul_op.h

-    }
-  }
+struct IdentityGrad_DY {
+  HOSTDEVICE T operator()(T x, T y, T out, T dout) const { return dout * x; }


This is not a IdentityGrad

refine elementwise_mul_op

a1331f9

chengduoZH requested review from reyoung, qingqing01 and jacquesqiao March 6, 2018 16:10

jacquesqiao approved these changes Mar 7, 2018

View reviewed changes

chengduoZH merged commit c43995e into PaddlePaddle:develop Mar 7, 2018

chengduoZH changed the title ~~[Speed]Refine elementwise_mul_op~~ [Speed]Refine elementwise_mul_op gradient functor Mar 7, 2018

reyoung reviewed Mar 7, 2018

View reviewed changes

chengduoZH mentioned this pull request Mar 7, 2018

[Speed] Refine elementwise sub,div,min,max gradient functor #8820

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Speed]Refine elementwise_mul_op gradient functor #8810

[Speed]Refine elementwise_mul_op gradient functor #8810

chengduoZH commented Mar 6, 2018 •

edited

Loading

jacquesqiao left a comment

reyoung Mar 7, 2018

chengduoZH Mar 7, 2018

reyoung Mar 7, 2018

[Speed]Refine elementwise_mul_op gradient functor #8810

[Speed]Refine elementwise_mul_op gradient functor #8810

Conversation

chengduoZH commented Mar 6, 2018 • edited Loading

profile script

before optimize

after optimize

jacquesqiao left a comment

Choose a reason for hiding this comment

reyoung Mar 7, 2018

Choose a reason for hiding this comment

chengduoZH Mar 7, 2018

Choose a reason for hiding this comment

reyoung Mar 7, 2018

Choose a reason for hiding this comment

chengduoZH commented Mar 6, 2018 •

edited

Loading