Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Speed]Refine elementwise_mul_op gradient functor #8810

Merged

Conversation

chengduoZH
Copy link
Contributor

@chengduoZH chengduoZH commented Mar 6, 2018

fix #8811
Refine se_resnet_152 -> elementwise_mul_op gradient-functor
related issue: #8661

The time-consuming of elementwise_mul_grad reduced ten times : 3164.52 -->352.164

profile script

https://github.com/dzhwinter/benchmark/pull/83/files

before optimize

------------------------->     Profiling Report     <-------------------------

Place: All
Time unit: ms
Sorted by total time in descending order in the same thread

Event                            Calls       Total       Min.        Max.        Ave.
thread0::conv2d_grad             8007        16130.5     0.409856    105.679     2.01455
thread0::pool2d_grad             2652        5440.28     0.20384     14.7589     2.05139
thread0::conv2d                  8007        5274.17     0.168032    117.607     0.658695
thread0::elementwise_mul_grad    2550        3164.52     0.352928    4.35414     1.24099
thread0::sum                     5100        1403.87     0.1008      0.902272    0.275268
thread0::batch_norm_grad         8007        1308.53     0.044288    1.41232     0.163423
thread0::batch_norm              8007        1079.35     0.049216    1.16714     0.134801
thread0::elementwise_mul         36873       984.829     0.003072    7.73997     0.0267087
...

after optimize

Event                            Calls       Total       Min.        Max.        Ave.
thread0::conv2d_grad             8007        16069       0.40544     113.233     2.00687
thread0::pool2d_grad             2652        5413.14     0.203456    14.8475     2.04115
thread0::conv2d                  8007        5256.77     0.16912     115.189     0.656521
thread0::sum                     5100        1402.64     0.101024    0.879392    0.275028
thread0::batch_norm_grad         8007        1310.68     0.044832    1.4183      0.163692
thread0::batch_norm              8007        1073.83     0.049376    1.15424     0.134111
thread0::elementwise_mul         36873       991.32      0.003072    0.266528    0.0268847
thread0::momentum                34323       825.496     0.00336     1.07974     0.0240508
thread0::relu_grad               10353       819.45      0.00336     0.772896    0.079151
thread0::relu                    10353       581.015     0.003264    0.56944     0.0561204
thread0::elementwise_mul_grad    2550        352.164     0.061568    0.356768    0.138103
thread0::elementwise_add         7701        311.604     0.003648    0.365344    0.0404628
thread0::elementwise_add_grad    7701        308.575     0.003872    0.337504    0.0400694
...

Copy link
Member

@jacquesqiao jacquesqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great improve

@chengduoZH chengduoZH merged commit c43995e into PaddlePaddle:develop Mar 7, 2018
@chengduoZH chengduoZH changed the title [Speed]Refine elementwise_mul_op [Speed]Refine elementwise_mul_op gradient functor Mar 7, 2018
dy_e.device(d) = x_e * dz_e;
}
}
struct IdentityGrad_DX {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a IdentityGrad functor.
Please use a better name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been renamed here.

}
}
struct IdentityGrad_DY {
HOSTDEVICE T operator()(T x, T y, T out, T dout) const { return dout * x; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a IdentityGrad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Speed] Optimize elementwise_mul_op gradient functor
3 participants