OneDNN hardswish integration #30211

jakpiase · 2021-01-07T13:42:14Z

PR types

New features

PR changes

OPs

Describe

Added support for oneDNN hardswish activation function. Conv + activation and fc + activation fuse passes can now also fuse with hardswish activation.

Profiled on Intel(R) d on(R) Gold 6348H CPU @ 2.30GHz
warmup = 10, repeat = 100

CPU native
oneDNN without hardswish
oneDNN with hardswish

Total times:
oneDNN without hardswish / oneDNN with hardswish = 1.19
CPU native / oneDNN with hardswish = 2.76

paddle-bot-old · 2021-01-07T13:42:24Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

jakpiase · 2021-01-07T13:42:48Z

@jczaja Could you please take a look?

python/paddle/fluid/tests/unittests/mkldnn/test_activation_mkldnn_op.py

lidanqing-intel · 2021-01-27T11:16:22Z

@jakpiase Please do profiling with following config, thanks

void PrepareConfig(AnalysisConfig *config, int threads) {
  ...
  config->EnableMKLDNN();
  auto pass_builder = config->pass_builder();
  pass_builder->AppendPass("interpolate_mkldnn_pass");
}

lidanqing-intel · 2021-01-27T11:57:51Z

I profiled on my i9 machine, 10 warmup , 100 repeat

CPU Native

Total time: 29079.5
  Computation time       Total: 28937.1     Ratio: 99.5103%
  Framework overhead     Total: 142.393     Ratio: 0.48967%
-------------------------     GpuMemCpy Summary     -------------------------
GpuMemcpy                Calls: 0           Total: 0           Ratio: 0%
-------------------------       Event Summary       -------------------------
Event                            Calls       Total       Min.        Max.        Ave.        Ratio.
thread0::conv2d                  4510        13317.5     0.116527    67.3944     2.95288     0.457968
thread0::depthwise_conv2d        1650        8328.15     0.78423     12.7155     5.04736     0.286392
thread0::elementwise_add         5280        2598.19     0.026334    7.5885      0.492081    0.0893477
thread0::nearest_interp          660         1365.26     0.276725    11.2304     2.06858     0.0469493
thread0::conv2d_transpose        220         1134.83     2.17559     8.3012      5.15834     0.0390252
thread0::relu                    1540        1097.54     0.177974    6.24986     0.71269     0.0377428
thread0::hard_swish              2200        471.471     0.080303    1.49519     0.214305    0.0162131
thread0::batch_norm              1650        406.157     0.096041    1.57076     0.246156    0.0139671
thread0::concat                  110         258.622     2.32048     2.42123     2.35111     0.00889361
thread0::sigmoid                 110         71.0762     0.619227    0.693527    0.646148    0.0024442
thread0::scale                   110         29.1552     0.250245    0.313208    0.265047    0.0010026
thread0::load_combine            1           1.56114     1.56114     1.56114     1.56114     5.36851e-05

oneDNN without hard_swish

Total time: 15882.3
  Computation time       Total: 14388.5     Ratio: 90.5948%
  Framework overhead     Total: 1493.76     Ratio: 9.40523%
-------------------------     GpuMemCpy Summary     -------------------------
GpuMemcpy                Calls: 0           Total: 0           Ratio: 0%
-------------------------       Event Summary       -------------------------
Event                            Calls       Total       Min.        Max.        Ave.        Ratio.
thread0::conv2d                  6160        7719.16     0.147497    30.6164     1.25311     0.486023
  int_reorder                    2476        891.356     0.002698    4.12293     0.030945    0.115473*
thread0::conv2d_transpose        220         3396.88     11.9331     34.8972     15.4403     0.213878
  int_reorder                    2           0.211463    0.014027    0.197436    0.197436    6.22522e-05*
thread0::sigmoid                 110         1372.17     12.2731     20.0907     12.4743     0.0863962
thread0::hard_swish              2200        1104.43     0.170863    9.0632      0.502012    0.0695382
  ext_reorder                    2200        525.534     0.066959    4.2296      4.2296      0.475844*
thread0::scale                   110         709.301     6.35583     6.61771     6.44819     0.0446599
  ext_reorder                    110         357.398     3.18237     3.34164     3.29843     0.503874*
thread0::relu                    110         443.658     3.94713     7.9087      4.03325     0.0279341
thread0::elementwise_add         330         432.945     0.178199    5.61183     1.31196     0.0272596
thread0::concat                  110         391.932     3.49597     6.85412     3.56302     0.0246773
thread0::nearest_interp          660         310.28      0.103491    5.83737     0.470121    0.0195362
thread0::load_combine            1           1.54059     1.54059     1.54059     1.54059     9.70004e-05

oneDNN and with this PR OneDNN hardswish integration #30211

Total time: 11505.1
  Computation time       Total: 11058.6     Ratio: 96.1186%
  Framework overhead     Total: 446.56      Ratio: 3.88139%
-------------------------     GpuMemCpy Summary     -------------------------
GpuMemcpy                Calls: 0           Total: 0           Ratio: 0%
-------------------------       Event Summary       -------------------------
Event                            Calls       Total       Min.        Max.        Ave.        Ratio.
thread0::conv2d                  6160        5998.91     0.160963    20.5096     0.973849    0.521411
  int_reorder                    166         136.692     0.004007    6.61985     6.61985     0.0227862*
thread0::conv2d_transpose        220         2864.84     11.1971     32.519      13.022      0.249005
  int_reorder                    112         115.58      0.018428    1.95845     1.95845     0.0403442*
thread0::sigmoid                 110         1015.53     9.02599     18.5955     9.2321      0.0882675
thread0::elementwise_add         330         429.875     0.177166    5.57589     1.30265     0.0373637
thread0::relu                    110         374.766     3.33516     7.332       3.40697     0.0325738
thread0::scale                   110         360.637     3.23246     3.97534     3.27852     0.0313457
  ext_reorder                    110         318.88      2.85873     3.06138     3.06138     0.884211*
thread0::nearest_interp          660         232.959     0.102815    5.89546     0.352969    0.0202483
thread0::concat                  110         226.047     2.00791     4.95641     2.05498     0.0196475
thread0::load_combine            1           1.58235     1.58235     1.58235

Performance improvement:
This PR: 11388/11058 = 1.03X
oneDNN / CPUNative = 2.6X

paddle-bot-old · 2021-02-05T03:24:43Z

Sorry to inform you that d4e524e's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

juncaipeng · 2021-02-10T11:11:31Z

@lidanqing-intel Please verify the accuracy with enabling OneDNN hardswish.

jczaja

LGTM

jczaja · 2021-02-24T09:04:00Z

@luotao1 Could you please start your review?

lidanqing-intel · 2021-02-25T12:39:11Z

@jakpiase Juncai ask to cherry-pick this PR to release/2.0

lidanqing-intel · 2021-02-26T09:12:24Z

@jakpiase
Update: Since cherry-pick this PR need to upgrade oneDNN 2.2 and release/2.0 is far from oneDNN 2.2, so do not cherry-pick

* OneDNN hardswish integration (#30211) * keep only conv + hardswish in this PR Co-authored-by: jakpiase <[email protected]>

Jakub Piasecki added 12 commits December 2, 2020 14:40

added external reorder to profiler

1ecc4cf

resolving merge conflict

92db4d2

added hardswish support for conv op

2a9998d

added hardswish support for fc op

587c524

added hardswish to mkldnn_reuse

3270196

added formatting

e2b4184

added activation tests

ec59b97

removed changes from mkldnn_reuse.h

0b1ef85

changed some old files

5fa6895

disable fc fuse_passes

8812b73

added clang formatting

d53b5dc

deleted minor redundancy

401b7f9

added comment

3daccfd

arogowie-intel previously approved these changes Jan 8, 2021

View reviewed changes

python/paddle/fluid/tests/unittests/mkldnn/test_activation_mkldnn_op.py Show resolved Hide resolved

jakpiase marked this pull request as draft January 8, 2021 08:20

jakpiase dismissed arogowie-intel’s stale review via 96d1b9c January 11, 2021 10:09

lidanqing-intel added the Intel label Jan 12, 2021

lidanqing-intel requested a review from jczaja January 12, 2021 06:11

lidanqing-intel mentioned this pull request Jan 26, 2021

[MKLDNN] Add MKLDNN hard_swish #30684

Closed

switched oneDNN version to 2.1.0

434cf2e

jakpiase force-pushed the hardswish_integration branch from 96d1b9c to 434cf2e Compare January 26, 2021 17:37

jakpiase marked this pull request as ready for review January 26, 2021 17:39

jakpiase changed the title ~~[DO NOT MERGE] OneDNN hardswish integration~~ OneDNN hardswish integration Jan 26, 2021

merged with paddle/develop

6e5b3b3

updated tests

843c86e

jakpiase force-pushed the hardswish_integration branch from fead798 to 843c86e Compare January 28, 2021 16:16

Jakub Piasecki added 2 commits January 28, 2021 17:18

resolved conflict

7072205

minor change

d4e524e

jczaja mentioned this pull request Feb 23, 2021

[oneDNN] Update of onednn to 2.2 #31067

Merged

Merge branch 'develop' into hardswish_integration

ce19df4

jczaja approved these changes Feb 24, 2021

View reviewed changes

jczaja assigned luotao1 Feb 24, 2021

luotao1 approved these changes Feb 25, 2021

View reviewed changes

luotao1 merged commit 2f11653 into PaddlePaddle:develop Feb 25, 2021

lidanqing-intel pushed a commit to lidanqing-intel/Paddle that referenced this pull request Mar 25, 2021

OneDNN hardswish integration (PaddlePaddle#30211)

a67f2d3

lidanqing-intel mentioned this pull request Mar 25, 2021

OneDNN hardswish integration (#30211) #31870

Merged

Superjomn pushed a commit that referenced this pull request Mar 31, 2021

OneDNN hardswish integration (#30211) (#31870)

b934d0b

* OneDNN hardswish integration (#30211) * keep only conv + hardswish in this PR Co-authored-by: jakpiase <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OneDNN hardswish integration #30211

OneDNN hardswish integration #30211

jakpiase commented Jan 7, 2021 •

edited

Loading

paddle-bot-old bot commented Jan 7, 2021

jakpiase commented Jan 7, 2021

lidanqing-intel commented Jan 27, 2021

lidanqing-intel commented Jan 27, 2021 •

edited

Loading

paddle-bot-old bot commented Feb 5, 2021

juncaipeng commented Feb 10, 2021

jczaja left a comment

jczaja commented Feb 24, 2021

lidanqing-intel commented Feb 25, 2021 •

edited

Loading

lidanqing-intel commented Feb 26, 2021

OneDNN hardswish integration #30211

OneDNN hardswish integration #30211

Conversation

jakpiase commented Jan 7, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Jan 7, 2021

jakpiase commented Jan 7, 2021

lidanqing-intel commented Jan 27, 2021

lidanqing-intel commented Jan 27, 2021 • edited Loading

paddle-bot-old bot commented Feb 5, 2021

juncaipeng commented Feb 10, 2021

jczaja left a comment

Choose a reason for hiding this comment

jczaja commented Feb 24, 2021

lidanqing-intel commented Feb 25, 2021 • edited Loading

lidanqing-intel commented Feb 26, 2021

jakpiase commented Jan 7, 2021 •

edited

Loading

lidanqing-intel commented Jan 27, 2021 •

edited

Loading

lidanqing-intel commented Feb 25, 2021 •

edited

Loading