Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the ernie inference performance on xpu backend. #50357

Merged

Conversation

csy0225
Copy link
Contributor

@csy0225 csy0225 commented Feb 9, 2023

PR types

Performance optimization

PR changes

Others

Describe

Inference 使用 xpu backend 推理增加 cache 机制,优化 ernie 模型,ernie medium 模型性能:
首次推理时间( 65 ms -> 20 ms),非首次推理时间( 45.07 ms -> 2.5 ms)
具体改动:

  • 修复 operator.cc 中关于 enable_cache_runtime_context 和 prepare_data 的逻辑,

  • 删除 pre_scope_ = cur_scope 的判断逻辑,该判断逻辑只应用于 enable_cache_runtime_context_ == true 的场景,判断是否重新生成 runtime_context,所以只需判断 runtime_ctx_.get() == nullptr 即可,因为 enable_cache_runtime_context_ 本身就是要复用 runtime_context,上述判断逻辑多余且非必要。

  • 区分 transfer_cache_scope 和 new_scope,new_scope 在 enable_cache_runtime_context_ == true 的场景下,runtime_context 中的 variable 变量会使用 new_scope 中的变量代替,因此需要保证,每次 run 的时候,new_scope 必须存在,不能删除,但是对于 cpu 推理的场景,transfer_cache_scope 是不需要的,所以二者需要区分开。

  • 增加重新创建 runtime_context 逻辑:

    • enable_cache_runtime_context 时,程序第一次运行。
    • enable_cache_runtime_context 时,程序非第一次运行,但是运行时 input shape 或者 layout 发生变化。

@paddle-bot
Copy link

paddle-bot bot commented Feb 9, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@csy0225 csy0225 force-pushed the inference_xpu_performance_compare branch from ee9d9f4 to 26cd92a Compare February 9, 2023 04:14
@csy0225 csy0225 force-pushed the inference_xpu_performance_compare branch from b118c21 to f28ffab Compare February 10, 2023 07:15
@csy0225 csy0225 force-pushed the inference_xpu_performance_compare branch from 40a3403 to ac8a2bf Compare February 13, 2023 07:05
@csy0225 csy0225 force-pushed the inference_xpu_performance_compare branch 3 times, most recently from f3c2829 to 4b25458 Compare February 14, 2023 03:35
@csy0225 csy0225 force-pushed the inference_xpu_performance_compare branch from 4b25458 to 2f92c7c Compare February 14, 2023 03:41
std::lock_guard<std::mutex> lock(cache_update_mutex_);
if (runtime_ctx_.get() == nullptr || pre_scope_ != cur_scope) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

多线程代码要两次判断:
if(cond)
lock
if(cond)
另外遍历式的查询对比原来是否会大幅增加cpu负担?

} else if (run_phi_kernel_ && impl_ != nullptr && !need_prepare_data_ &&
!need_prepare_phi_data_) {
if (!all_kernels_must_compute_runtime_shape_ && impl_->NeedInferShape()) {
this->Info().infer_shape_(impl_->getRuntimeInferShapeContext());
}
(*phi_kernel_)(impl_->getKernelContext());
} else {
if (runtime_ctx_.get() == nullptr || pre_scope_ != cur_scope) {
if (NeedResetRuntimeContext(scope)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里不清楚增加了多少开销,现在如果走到这个分支每次都需要从scope里取var,这里可能是个新增的热点

} else if (run_phi_kernel_ && impl_ != nullptr && !need_prepare_data_ &&
!need_prepare_phi_data_) {
if (!all_kernels_must_compute_runtime_shape_ && impl_->NeedInferShape()) {
this->Info().infer_shape_(impl_->getRuntimeInferShapeContext());
}
(*phi_kernel_)(impl_->getKernelContext());
} else {
if (runtime_ctx_.get() == nullptr || pre_scope_ != cur_scope) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre_scope_ != cur_scope的情况和输入Tensor的shape或者layout发生了变化是等价的吗?

Copy link
Contributor Author

@csy0225 csy0225 Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有一点相似,但是之前的代码在 enable_cache_runtime_context 的时候,只要有 tensor(例如 cpu->mkldnn) 发生变化的情况,pre_scope 就被强制设置成了 null,每次运行就都必须重新创建 runtime_context,然后重新 prepare_data, 非常耗时,cache 功能失效了。但是在第一次运行后,第二次运行在 tensor shape 和 layout 没有变化的场景下,还是可以用 cache 的。这个 PR 修复了上面这个问题

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有没有存在pre_scope 不为 null,但是和当前scope不是同一个的情况?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看代码逻辑是不存在的,op 第一次运行时 pre_scope 就会被赋值为 cur_scope, 如果没有其他赋值操作,pre_scope 只能是cur_scope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之前在调试时遇到过同一个Op前后两次传入的scope不同的情况,不太确定这种情况下pre_scope会不会和scope不一致而导致更新

} else if (run_phi_kernel_ && impl_ != nullptr && !need_prepare_data_ &&
!need_prepare_phi_data_) {
if (!all_kernels_must_compute_runtime_shape_ && impl_->NeedInferShape()) {
this->Info().infer_shape_(impl_->getRuntimeInferShapeContext());
}
(*phi_kernel_)(impl_->getKernelContext());
} else {
if (runtime_ctx_.get() == nullptr || pre_scope_ != cur_scope) {
if (NeedResetRuntimeContext(scope)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每次推理都需要调用这个接口,影响面比较大,是否会对其它硬件带来性能下降的问题?现在 Inference+GPU 的推理是否也 enable 了 cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

恩,这个和测试说了,他们这边最近要编包验收,暂时先合入 develop,性能问题随着 develop 分支的 benchmark 测试一起,有问题他会找我。

Copy link
Contributor

@hong19860320 hong19860320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@qili93 qili93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qili93 qili93 merged commit b39afb1 into PaddlePaddle:develop Feb 21, 2023
@csy0225 csy0225 mentioned this pull request Feb 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants