Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge momentum ops/kernels #36380

Merged
merged 10 commits into from
Oct 14, 2021
Merged

Conversation

sneaxiy
Copy link
Collaborator

@sneaxiy sneaxiy commented Oct 12, 2021

PR types

Performance optimization

PR changes

OPs

Describe

Merge multiple momentum ops/kernels to be one momentum ops/kernels.

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@sneaxiy sneaxiy changed the title [WIP] Merge momentum ops Merge momentum ops Oct 13, 2021
template <typename T, typename MT, bool kHasMasterParams,
uint32_t kParamNum = kHasMasterParams ? 55 : 110>
struct MergedMomentumKernelParam
: public MergedMomentumMasterParams<MT, kParamNum, kHasMasterParams> {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方继承的目的是利用C++编译器的一个特性:继承空类不会影响该类的sizeof的大小。因此当不需要MasterParams时,sizeof(MergedMomentumKernelParam)会更小,可以容纳更多的kParamNum

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

学到了~

auto master_params = ctx.MultiInput<framework::Tensor>("MasterParam");
auto master_params_out =
ctx.MultiOutput<framework::Tensor>("MasterParamOut");
auto multi_precision = ctx.Attr<bool>("multi_precision");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的判断是不是表明了会在python端区分性的收集好AMP 和 非AMP optimizer,再分别传入执行optimizer计算

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的。把AMP和非AMP分开有以下几个原因的考虑:

  • OperatorWithKernel::GetExpectedKernelType这个方法更好写。不然就得遍历所有Param的类型,如果有混合类型(FP16、FP32)得返回调用FP16的kernel,没有混合类型就调用单一类型的kernel。写起来会非常麻烦。
  • MergedMomentumKernelParam这个类的sizeof会更小一些,不用装载bool multi_precision[N],里面的实现也不用到处if-else判断是不是混合类型。
  • 如果有混合类型,MergedMomentumKernelParam这个类的paramsgrads只能写成void *params[N]void *grads[N]了,代码可读性比较差。

static constexpr auto N = kParamNum;
size_t sizes[N];
T *params[N];
const T *grads[N];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const T * 这类只读数据加入 __restrict__ 关键字会有一点性能提升,但是提升幅度可能不会明显,https://developer.nvidia.com/blog/cuda-pro-tip-optimize-pointer-aliasing/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

template <typename T, typename MT, bool kHasMasterParams,
uint32_t kParamNum = kHasMasterParams ? 55 : 110>
struct MergedMomentumKernelParam
: public MergedMomentumMasterParams<MT, kParamNum, kHasMasterParams> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

学到了~

@sneaxiy sneaxiy changed the title Merge momentum ops Merge momentum ops/kernels Oct 13, 2021
@sneaxiy sneaxiy closed this Oct 13, 2021
@sneaxiy sneaxiy reopened this Oct 13, 2021
@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Oct 13, 2021
@PaddlePaddle PaddlePaddle unlocked this conversation Oct 13, 2021
Copy link
Contributor

@JamesLim-sy JamesLim-sy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM COOL!

Copy link
Contributor

@TCChenlong TCChenlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sneaxiy sneaxiy merged commit f4eda86 into PaddlePaddle:develop Oct 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants