-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the ernie inference performance on xpu backend. #50357
Optimize the ernie inference performance on xpu backend. #50357
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
ee9d9f4
to
26cd92a
Compare
b118c21
to
f28ffab
Compare
40a3403
to
ac8a2bf
Compare
f3c2829
to
4b25458
Compare
4b25458
to
2f92c7c
Compare
std::lock_guard<std::mutex> lock(cache_update_mutex_); | ||
if (runtime_ctx_.get() == nullptr || pre_scope_ != cur_scope) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
多线程代码要两次判断:
if(cond)
lock
if(cond)
另外遍历式的查询对比原来是否会大幅增加cpu负担?
} else if (run_phi_kernel_ && impl_ != nullptr && !need_prepare_data_ && | ||
!need_prepare_phi_data_) { | ||
if (!all_kernels_must_compute_runtime_shape_ && impl_->NeedInferShape()) { | ||
this->Info().infer_shape_(impl_->getRuntimeInferShapeContext()); | ||
} | ||
(*phi_kernel_)(impl_->getKernelContext()); | ||
} else { | ||
if (runtime_ctx_.get() == nullptr || pre_scope_ != cur_scope) { | ||
if (NeedResetRuntimeContext(scope)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不清楚增加了多少开销,现在如果走到这个分支每次都需要从scope里取var,这里可能是个新增的热点
} else if (run_phi_kernel_ && impl_ != nullptr && !need_prepare_data_ && | ||
!need_prepare_phi_data_) { | ||
if (!all_kernels_must_compute_runtime_shape_ && impl_->NeedInferShape()) { | ||
this->Info().infer_shape_(impl_->getRuntimeInferShapeContext()); | ||
} | ||
(*phi_kernel_)(impl_->getKernelContext()); | ||
} else { | ||
if (runtime_ctx_.get() == nullptr || pre_scope_ != cur_scope) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre_scope_ != cur_scope
的情况和输入Tensor的shape或者layout发生了变化是等价的吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有一点相似,但是之前的代码在 enable_cache_runtime_context 的时候,只要有 tensor(例如 cpu->mkldnn) 发生变化的情况,pre_scope 就被强制设置成了 null,每次运行就都必须重新创建 runtime_context,然后重新 prepare_data, 非常耗时,cache 功能失效了。但是在第一次运行后,第二次运行在 tensor shape 和 layout 没有变化的场景下,还是可以用 cache 的。这个 PR 修复了上面这个问题
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有没有存在pre_scope 不为 null,但是和当前scope不是同一个的情况?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看代码逻辑是不存在的,op 第一次运行时 pre_scope 就会被赋值为 cur_scope, 如果没有其他赋值操作,pre_scope 只能是cur_scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
之前在调试时遇到过同一个Op前后两次传入的scope不同的情况,不太确定这种情况下pre_scope会不会和scope不一致而导致更新
} else if (run_phi_kernel_ && impl_ != nullptr && !need_prepare_data_ && | ||
!need_prepare_phi_data_) { | ||
if (!all_kernels_must_compute_runtime_shape_ && impl_->NeedInferShape()) { | ||
this->Info().infer_shape_(impl_->getRuntimeInferShapeContext()); | ||
} | ||
(*phi_kernel_)(impl_->getKernelContext()); | ||
} else { | ||
if (runtime_ctx_.get() == nullptr || pre_scope_ != cur_scope) { | ||
if (NeedResetRuntimeContext(scope)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
每次推理都需要调用这个接口,影响面比较大,是否会对其它硬件带来性能下降的问题?现在 Inference+GPU 的推理是否也 enable 了 cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
恩,这个和测试说了,他们这边最近要编包验收,暂时先合入 develop,性能问题随着 develop 分支的 benchmark 测试一起,有问题他会找我。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Performance optimization
PR changes
Others
Describe
Inference 使用 xpu backend 推理增加 cache 机制,优化 ernie 模型,ernie medium 模型性能:
首次推理时间( 65 ms -> 20 ms),非首次推理时间( 45.07 ms -> 2.5 ms)
具体改动:
修复 operator.cc 中关于 enable_cache_runtime_context 和 prepare_data 的逻辑,
删除 pre_scope_ = cur_scope 的判断逻辑,该判断逻辑只应用于 enable_cache_runtime_context_ == true 的场景,判断是否重新生成 runtime_context,所以只需判断 runtime_ctx_.get() == nullptr 即可,因为 enable_cache_runtime_context_ 本身就是要复用 runtime_context,上述判断逻辑多余且非必要。
区分 transfer_cache_scope 和 new_scope,new_scope 在 enable_cache_runtime_context_ == true 的场景下,runtime_context 中的 variable 变量会使用 new_scope 中的变量代替,因此需要保证,每次 run 的时候,new_scope 必须存在,不能删除,但是对于 cpu 推理的场景,transfer_cache_scope 是不需要的,所以二者需要区分开。
增加重新创建 runtime_context 逻辑: