Merge momentum ops/kernels #36380

sneaxiy · 2021-10-12T13:33:47Z

PR types

Performance optimization

PR changes

OPs

Describe

Merge multiple momentum ops/kernels to be one momentum ops/kernels.

paddle-bot-old · 2021-10-12T13:34:03Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

sneaxiy · 2021-10-13T09:26:20Z

paddle/fluid/operators/merged_momentum_op.h

+template <typename T, typename MT, bool kHasMasterParams,
+          uint32_t kParamNum = kHasMasterParams ? 55 : 110>
+struct MergedMomentumKernelParam
+    : public MergedMomentumMasterParams<MT, kParamNum, kHasMasterParams> {


这个地方继承的目的是利用C++编译器的一个特性：继承空类不会影响该类的sizeof的大小。因此当不需要MasterParams时，sizeof(MergedMomentumKernelParam)会更小，可以容纳更多的kParamNum。

JamesLim-sy · 2021-10-13T11:22:09Z

paddle/fluid/operators/merged_momentum_op.h

+    auto master_params = ctx.MultiInput<framework::Tensor>("MasterParam");
+    auto master_params_out =
+        ctx.MultiOutput<framework::Tensor>("MasterParamOut");
+    auto multi_precision = ctx.Attr<bool>("multi_precision");


这里的判断是不是表明了会在python端区分性的收集好AMP 和非AMP optimizer，再分别传入执行optimizer计算

是的。把AMP和非AMP分开有以下几个原因的考虑：

OperatorWithKernel::GetExpectedKernelType这个方法更好写。不然就得遍历所有Param的类型，如果有混合类型（FP16、FP32）得返回调用FP16的kernel，没有混合类型就调用单一类型的kernel。写起来会非常麻烦。

MergedMomentumKernelParam这个类的sizeof会更小一些，不用装载bool multi_precision[N]，里面的实现也不用到处if-else判断是不是混合类型。

如果有混合类型，MergedMomentumKernelParam这个类的params和grads只能写成void *params[N]和void *grads[N]了，代码可读性比较差。

JamesLim-sy · 2021-10-13T11:26:56Z

paddle/fluid/operators/merged_momentum_op.h

+  static constexpr auto N = kParamNum;
+  size_t sizes[N];
+  T *params[N];
+  const T *grads[N];


const T * 这类只读数据加入 __restrict__ 关键字会有一点性能提升，但是提升幅度可能不会明显，https://developer.nvidia.com/blog/cuda-pro-tip-optimize-pointer-aliasing/

JamesLim-sy · 2021-10-13T11:27:40Z

paddle/fluid/operators/merged_momentum_op.h

+template <typename T, typename MT, bool kHasMasterParams,
+          uint32_t kParamNum = kHasMasterParams ? 55 : 110>
+struct MergedMomentumKernelParam
+    : public MergedMomentumMasterParams<MT, kParamNum, kHasMasterParams> {


JamesLim-sy

LGTM COOL!

TCChenlong

LGTM

merge momentum ops

fbfea39

sneaxiy added 2 commits October 12, 2021 15:21

update

3f04db0

add ut to improve coverage

8d1c62a

sneaxiy changed the title ~~[WIP] Merge momentum ops~~ Merge momentum ops Oct 13, 2021

remove optimizer change

d1db307

sneaxiy requested review from JamesLim-sy and Xreki October 13, 2021 09:12

sneaxiy commented Oct 13, 2021

View reviewed changes

sneaxiy added 2 commits October 13, 2021 10:00

fix error msg

bea1626

update ut

32d78f3

JamesLim-sy reviewed Oct 13, 2021

View reviewed changes

sneaxiy added 2 commits October 13, 2021 13:47

add __restrict__ for CUDA

4e3afb2

update ut

9ac7b47

sneaxiy changed the title ~~Merge momentum ops~~ Merge momentum ops/kernels Oct 13, 2021

sneaxiy closed this Oct 13, 2021

sneaxiy reopened this Oct 13, 2021

PaddlePaddle locked and limited conversation to collaborators Oct 13, 2021

PaddlePaddle unlocked this conversation Oct 13, 2021

move merged_momentum_op to optimizer dir

ca2ecae

sneaxiy force-pushed the merge_momentum branch from bd25e1b to ca2ecae Compare October 13, 2021 15:33

fix coverage

80d2d9a

sneaxiy requested review from lanxianghit and TCChenlong October 14, 2021 04:18

JamesLim-sy approved these changes Oct 14, 2021

View reviewed changes

TCChenlong approved these changes Oct 14, 2021

View reviewed changes

sneaxiy requested a review from XiaoguangHu01 October 14, 2021 05:30

lanxianghit approved these changes Oct 14, 2021

View reviewed changes

sneaxiy merged commit f4eda86 into PaddlePaddle:develop Oct 14, 2021

sneaxiy deleted the merge_momentum branch October 14, 2021 06:41

JamesLim-sy mentioned this pull request Oct 14, 2021

Make lars cpp code flexible #36450

Closed

sneaxiy mentioned this pull request Nov 10, 2021

MLPerf Optimization for Release/2.2 #37109

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge momentum ops/kernels #36380

Merge momentum ops/kernels #36380

sneaxiy commented Oct 12, 2021 •

edited

Loading

paddle-bot-old bot commented Oct 12, 2021

sneaxiy Oct 13, 2021

JamesLim-sy Oct 13, 2021

JamesLim-sy Oct 13, 2021

sneaxiy Oct 13, 2021

JamesLim-sy Oct 13, 2021

sneaxiy Oct 13, 2021

JamesLim-sy Oct 13, 2021

JamesLim-sy left a comment

TCChenlong left a comment

Merge momentum ops/kernels #36380

Merge momentum ops/kernels #36380

Conversation

sneaxiy commented Oct 12, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Oct 12, 2021

sneaxiy Oct 13, 2021

Choose a reason for hiding this comment

JamesLim-sy Oct 13, 2021

Choose a reason for hiding this comment

JamesLim-sy Oct 13, 2021

Choose a reason for hiding this comment

sneaxiy Oct 13, 2021

Choose a reason for hiding this comment

JamesLim-sy Oct 13, 2021

Choose a reason for hiding this comment

sneaxiy Oct 13, 2021

Choose a reason for hiding this comment

JamesLim-sy Oct 13, 2021

Choose a reason for hiding this comment

JamesLim-sy left a comment

Choose a reason for hiding this comment

TCChenlong left a comment

Choose a reason for hiding this comment

sneaxiy commented Oct 12, 2021 •

edited

Loading