Optimize the ernie inference performance on xpu backend. #50357

csy0225 · 2023-02-09T03:33:30Z

PR types

Performance optimization

PR changes

Others

Describe

Inference 使用 xpu backend 推理增加 cache 机制，优化 ernie 模型，ernie medium 模型性能:
首次推理时间( 65 ms -> 20 ms)，非首次推理时间( 45.07 ms -> 2.5 ms)
具体改动：

修复 operator.cc 中关于 enable_cache_runtime_context 和 prepare_data 的逻辑,
删除 pre_scope_ = cur_scope 的判断逻辑，该判断逻辑只应用于 enable_cache_runtime_context_ == true 的场景，判断是否重新生成 runtime_context，所以只需判断 runtime_ctx_.get() == nullptr 即可，因为 enable_cache_runtime_context_ 本身就是要复用 runtime_context，上述判断逻辑多余且非必要。
区分 transfer_cache_scope 和 new_scope，new_scope 在 enable_cache_runtime_context_ == true 的场景下，runtime_context 中的 variable 变量会使用 new_scope 中的变量代替，因此需要保证，每次 run 的时候，new_scope 必须存在，不能删除，但是对于 cpu 推理的场景，transfer_cache_scope 是不需要的，所以二者需要区分开。
增加重新创建 runtime_context 逻辑:
- enable_cache_runtime_context 时，程序第一次运行。
- enable_cache_runtime_context 时，程序非第一次运行，但是运行时 input shape 或者 layout 发生变化。

paddle-bot · 2023-02-09T03:33:34Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

b3602sss · 2023-02-14T04:59:08Z

paddle/fluid/framework/operator.cc

      std::lock_guard<std::mutex> lock(cache_update_mutex_);
-      if (runtime_ctx_.get() == nullptr || pre_scope_ != cur_scope) {


多线程代码要两次判断：
if(cond)
lock
if(cond)
另外遍历式的查询对比原来是否会大幅增加cpu负担？

jiweibo · 2023-02-14T06:36:24Z

paddle/fluid/framework/operator.cc

  } else if (run_phi_kernel_ && impl_ != nullptr && !need_prepare_data_ &&
             !need_prepare_phi_data_) {
    if (!all_kernels_must_compute_runtime_shape_ && impl_->NeedInferShape()) {
      this->Info().infer_shape_(impl_->getRuntimeInferShapeContext());
    }
    (*phi_kernel_)(impl_->getKernelContext());
  } else {
-    if (runtime_ctx_.get() == nullptr || pre_scope_ != cur_scope) {
+    if (NeedResetRuntimeContext(scope)) {


这里不清楚增加了多少开销，现在如果走到这个分支每次都需要从scope里取var，这里可能是个新增的热点

zyfncg · 2023-02-15T02:31:30Z

paddle/fluid/framework/operator.cc

  } else if (run_phi_kernel_ && impl_ != nullptr && !need_prepare_data_ &&
             !need_prepare_phi_data_) {
    if (!all_kernels_must_compute_runtime_shape_ && impl_->NeedInferShape()) {
      this->Info().infer_shape_(impl_->getRuntimeInferShapeContext());
    }
    (*phi_kernel_)(impl_->getKernelContext());
  } else {
-    if (runtime_ctx_.get() == nullptr || pre_scope_ != cur_scope) {


pre_scope_ != cur_scope的情况和输入Tensor的shape或者layout发生了变化是等价的吗？

有一点相似，但是之前的代码在 enable_cache_runtime_context 的时候，只要有 tensor（例如 cpu->mkldnn）发生变化的情况，pre_scope 就被强制设置成了 null，每次运行就都必须重新创建 runtime_context，然后重新 prepare_data, 非常耗时，cache 功能失效了。但是在第一次运行后，第二次运行在 tensor shape 和 layout 没有变化的场景下，还是可以用 cache 的。这个 PR 修复了上面这个问题

有没有存在pre_scope 不为 null，但是和当前scope不是同一个的情况？

看代码逻辑是不存在的，op 第一次运行时 pre_scope 就会被赋值为 cur_scope, 如果没有其他赋值操作，pre_scope 只能是cur_scope.

之前在调试时遇到过同一个Op前后两次传入的scope不同的情况，不太确定这种情况下pre_scope会不会和scope不一致而导致更新

hong19860320 · 2023-02-17T11:16:43Z

paddle/fluid/framework/operator.cc

  } else if (run_phi_kernel_ && impl_ != nullptr && !need_prepare_data_ &&
             !need_prepare_phi_data_) {
    if (!all_kernels_must_compute_runtime_shape_ && impl_->NeedInferShape()) {
      this->Info().infer_shape_(impl_->getRuntimeInferShapeContext());
    }
    (*phi_kernel_)(impl_->getKernelContext());
  } else {
-    if (runtime_ctx_.get() == nullptr || pre_scope_ != cur_scope) {
+    if (NeedResetRuntimeContext(scope)) {


每次推理都需要调用这个接口，影响面比较大，是否会对其它硬件带来性能下降的问题？现在 Inference+GPU 的推理是否也 enable 了 cache？

恩，这个和测试说了，他们这边最近要编包验收，暂时先合入 develop，性能问题随着 develop 分支的 benchmark 测试一起，有问题他会找我。

hong19860320

LGTM

qili93

LGTM

Optimize the ernie inference performance on xpu

26cd92a

csy0225 force-pushed the inference_xpu_performance_compare branch from ee9d9f4 to 26cd92a Compare February 9, 2023 04:14

fix enable runtime cache logic

f28ffab

csy0225 force-pushed the inference_xpu_performance_compare branch from b118c21 to f28ffab Compare February 10, 2023 07:15

when op's input shape has changed, should create a new runtime context

ac8a2bf

csy0225 force-pushed the inference_xpu_performance_compare branch from 40a3403 to ac8a2bf Compare February 13, 2023 07:05

fix

3c57796

csy0225 force-pushed the inference_xpu_performance_compare branch 3 times, most recently from f3c2829 to 4b25458 Compare February 14, 2023 03:35

set flag when input shape has changed

2f92c7c

csy0225 force-pushed the inference_xpu_performance_compare branch from 4b25458 to 2f92c7c Compare February 14, 2023 03:41

b3602sss reviewed Feb 14, 2023

View reviewed changes

jiweibo reviewed Feb 14, 2023

View reviewed changes

zyfncg reviewed Feb 15, 2023

View reviewed changes

hong19860320 reviewed Feb 17, 2023

View reviewed changes

hong19860320 approved these changes Feb 20, 2023

View reviewed changes

qili93 approved these changes Feb 21, 2023

View reviewed changes

qili93 merged commit b39afb1 into PaddlePaddle:develop Feb 21, 2023

csy0225 mentioned this pull request Feb 25, 2023

Revert cache update #50895

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the ernie inference performance on xpu backend. #50357

Optimize the ernie inference performance on xpu backend. #50357

csy0225 commented Feb 9, 2023 •

edited

Loading

paddle-bot bot commented Feb 9, 2023

b3602sss Feb 14, 2023

jiweibo Feb 14, 2023

zyfncg Feb 15, 2023

csy0225 Feb 15, 2023 •

edited

Loading

zyfncg Feb 15, 2023

csy0225 Feb 15, 2023

zyfncg Feb 15, 2023

hong19860320 Feb 17, 2023

csy0225 Feb 20, 2023

hong19860320 left a comment

qili93 left a comment

		std::lock_guard<std::mutex> lock(cache_update_mutex_);
		if (runtime_ctx_.get() == nullptr \|\| pre_scope_ != cur_scope) {

Optimize the ernie inference performance on xpu backend. #50357

Optimize the ernie inference performance on xpu backend. #50357

Conversation

csy0225 commented Feb 9, 2023 • edited Loading

PR types

PR changes

Describe

paddle-bot bot commented Feb 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csy0225 Feb 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hong19860320 left a comment

Choose a reason for hiding this comment

qili93 left a comment

Choose a reason for hiding this comment

csy0225 commented Feb 9, 2023 •

edited

Loading

csy0225 Feb 15, 2023 •

edited

Loading