Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Paddle inference] support new quant_model #41049

Merged
merged 8 commits into from
Apr 2, 2022

Conversation

Wangzheee
Copy link
Contributor

PR types

Others

PR changes

Others

Describe

Paddle Inference支持新的量化模型,涉及全套的推理量化的pass、量化OP,量化信息在pass中处理,在 op_converter.h 和所有的 convert op 中使用

@wanghaoshuang wanghaoshuang requested review from ceci3 and yghstill March 28, 2022 14:37
@paddle-bot-old
Copy link

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

quant_op->Op()->InputNames()[i],
quant_op->Op()->GetAttr(
"Input_scale_" +
quant_op->Op()->Input(quant_op->Op()->InputNames()[i])[0]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quant_op->Op()->InputNames()[I] 被使用了3次,可以声明个局部变量增强下可读性么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的

for (size_t i = 0; i < quant_op->Op()->InputNames().size(); i++) {
if (quant_op->Op()->Input(quant_op->Op()->InputNames()[i]).size() > 0 &&
quant_op->Op()->HasAttr(
"Input_scale_" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最好定一个方法来构造这个attr name, 可以保持SetAttr、GetAttr、HasAttr时的行为一致。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前scale是绑定tensor,中间的pass能拿到tensor的名字,最后一个pass会修改tensor的名字,convert里就无法与scale匹配。目前还没有解决办法。
这是一个中间的变量,为后续的pass提供tensor的scale


// If inputs'tensors have the inputs_scale, then save it's index in
// input_quant_tensor_index
// OP'Attr hasn't std::vector<std::pair< >>. To do: Support multi-tensor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用一个PADDLE_ENFORCE_EQ处理下这种不支持的情况?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass里如果加PADDLE_ENFORCE_EQ报错是会挂掉,希望是return,这个我可以加个warning,如果输入或者输出中的某一个node是 tensor list 就return并报warning


// If outputs'tensors have the outputs_scale, then save it's index in
// output_quant_tensor_index
// OP'Attr hasn't std::vector<std::pair< >>. To do: Support multi-tensor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用一个PADDLE_ENFORCE_EQ处理下这种不支持的情况?

for (auto out_op_node : out_node->outputs) {
for (auto name : out_op_node->Op()->InputNames()) {
for (auto input_name : out_op_node->Op()->Input(name)) {
if (out_op_node->Op()->HasAttr("Input_scale_" + input_name)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,这种拼接key的方式不好读、不好维护。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

绑定tensor的scale,目前还没想到有其他的方法设置attribute

}
quant_op->Op()->SetAttr("support_int8", inscale_flag && outscale_flag);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"outscale_flag"不是quant_op ”support_int8“的必要条件。 @yghstill @ceci3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果"outscale_flag"不是quant_op ”support_int8“的必要条件,这个pass还有存在的必要么?
我理解这个pass做了两件事情:

  1. 将当前quant_op的attr中的input scale换了个key存储;这在delete_quant_dequant_linear_pass中也可以做吧?
  2. 寻找quant_op后继op的input scale,并设置到quant_op的output scale; 这个output scale最终会被设置为quant_op的output的dynamic range, 这个和设置为后继OP的input的dynamic range效果是一样的吧?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是为了推理的混合精度设计的,以OP为粒度进行推理量化,在最后的 op_convert 中以OP为单位设置量化信息

@@ -86,6 +86,8 @@ pass_library(quant_conv2d_dequant_fuse_pass inference)
pass_library(shuffle_channel_detect_pass inference)
pass_library(delete_quant_dequant_op_pass inference)
pass_library(delete_quant_dequant_filter_op_pass inference)
pass_library(delete_weight_quant_dequant_linear_op_pass inference)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weight只有dequant, 没有quant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,我改个名字

Copy link
Contributor

@wanghaoshuang wanghaoshuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

预期是只添加delete_quant_dequant_linear_op_pass和delete_weight_dequant_linear_op_pass两个pass就行,对齐原来scale的设置方式,这样就不用修改其它fusion pass和converter了。
如果之前的fusion pass、converter有什么不合理的,可以另起PR修改,不建议将风险集中到当前一个PR,会增加测试压力和出现不兼容问题的概率。

Comment on lines +99 to +105
/*
if (!IsCompat(subgraph, g)) {
LOG(WARNING) << "delete_quant_dequant_linear_op_pass "
"compat check failed.";
return;
}
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unused lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个要等训练OP合入才能打开

paddle::platform::is_cpu_place(input_scale_tensor.place()), true,
platform::errors::InvalidArgument(
"Input scale tensor's place should be CPU."));
const float* input_scale_data = input_scale_tensor.data<float>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我看 @yghstill 还支持了FP64的数值类型?

float input_scale = input_scale_data[0] / range;

auto* any_op2_desc = any_op2->Op();
any_op2_desc->SetAttr("Input_scale_" + quantize_linear_op_x->Var()->Name(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,可以优化下这种拼接key的方式。

}
quant_op->Op()->SetAttr("support_int8", inscale_flag && outscale_flag);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果"outscale_flag"不是quant_op ”support_int8“的必要条件,这个pass还有存在的必要么?
我理解这个pass做了两件事情:

  1. 将当前quant_op的attr中的input scale换了个key存储;这在delete_quant_dequant_linear_pass中也可以做吧?
  2. 寻找quant_op后继op的input scale,并设置到quant_op的output scale; 这个output scale最终会被设置为quant_op的output的dynamic range, 这个和设置为后继OP的input的dynamic range效果是一样的吧?

Comment on lines -88 to -93
if (quantized_op_type == "mul" || quantized_op_type == "matmul" ||
quantized_op_type == "matmul_v2") {
op_desc->SetAttr("X_scale", input_scale);
} else {
op_desc->SetAttr("Input_scale", input_scale);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不能把新格式和旧的Attr的设置方式对齐么,这样就不用修改converter了。
原来的设置方式是有些不合理,但是可以另起PR修改,尽量避免当前PR修改过多的模块。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前无法对齐,之前的Attr只有一个标量,无法表示多输入多输出的情况,更不能表示其中一个输入(输出)是tensor list的情况,理论上需要一个二维数组scale 才能完整绑定tensor,但是框架只支持一维数组,现在还没有想到解决办法

//
// Licensed under the Apache License, Version 2.0 (the "License");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是Apache 2.0, 不是paddle 2.0,不用更新版本

// input_quant_tensor_index
// OP'Attr hasn't std::vector<std::pair< >>. To do: Support multi-tensor
// scale for one input
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没用代码可以删掉

for (size_t i = 0; i< quant_op->Op()->OutputNames().size() ; i++){
for (size_t j =0; j<
quant_op->Op()->Output(quant_op->Op()->OutputNames()[i]).size();j++){
if(input_name ==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上没用可以删除

// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version 2.3 -》 2.0, LICENSE-2.3 -》 2.0,请排查下这个pr的所有copyright声明

@@ -0,0 +1,148 @@
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.3 (the "License");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Licensed under the Apache License, Version 2.3 (the "License");
// Licensed under the Apache License, Version 2.0(the "License");

Comment on lines 43 to 56
// scale for one input
/*
for (size_t i = 0; i< quant_op->Op()->InputNames().size() ; i++){
for (size_t j =0; j<
quant_op->Op()->Input(quant_op->Op()->InputNames()[i]).size();j++){
if(quant_op->Op()->HasAttr("Input_scale_"+quant_op->Op()->Input(quant_op->Op()->InputNames()[i])[j])){
inscale_flag = true;
input_quant_tensor_index.push_back(std::make_pair(i,j));
inputs_scale.push_back(BOOST_GET_CONST(float,
quant_op->Op()->GetAttr("Input_scale_"+quant_op->Op()->Input(quant_op->Op()->InputNames()[i])[j])));
}
}
}
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是不支持的情况的代码被注释掉了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯嗯,我直接删掉吧

Comment on lines +869 to +883
mul0_op_desc->GetAttr("Input_scale"));
}
auto* add0_op_desc = eltadd0->Op();
auto* add1_op_desc = eltadd1->Op();
auto* add2_op_desc = eltadd2->Op();
if (add0_op_desc->HasAttr("out_threshold")) {
auto out_scale0 =
BOOST_GET_CONST(float, add0_op_desc->GetAttr("out_threshold"));
auto out_scale1 =
BOOST_GET_CONST(float, add1_op_desc->GetAttr("out_threshold"));
auto out_scale2 =
BOOST_GET_CONST(float, add2_op_desc->GetAttr("out_threshold"));
auto out_scale_max = std::max(out_scale0, out_scale1);
out_scale_max = std::max(out_scale_max, out_scale2);
multihead_op_desc.SetAttr("fc_out_threshold", out_scale_max);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是不需要weight scale了吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的

@@ -356,7 +355,7 @@ void QuantDequantFusePass::DeleteQuant(ir::Graph* graph, Scope* scope,
"Input scale tensor's place should be CPU."));
const float* input_scale_data = input_scale_tensor.data<float>();
float in_scale = input_scale_data[0];
float scale_value = in_scale / range;
float scale_value = in_scale;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个去掉 / range还兼容之前的量化版本吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

兼容,对应使用这个变量的地方也改了

@@ -148,14 +146,18 @@ class FcOpConverter : public OpConverter {
auto regist_fc = [&](nvinfer1::ITensor* inputs, int n_output,
TensorRTEngine::Weight& weight,
TensorRTEngine::Weight& bias) {
if (enable_int8) {
if (enable_int8 || support_int8) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个不能统一吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

要兼容两种量化格式,目前还删不掉

@@ -93,7 +91,7 @@ class MultiheadMatMulOpConverter : public OpConverter {
static_cast<int32_t>(bias_t->numel())};
if (engine_->with_interleaved()) {
VLOG(4) << "fused multihead_matmul op: use_oss and with_interleaved";
if (!enable_int8) {
if (!op_desc.HasAttr("Input_scale")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么要换成这个?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种多个op融合的大OP,这个flag有点不合适,还没想到换成啥,先直接用更直接的属性判断,两者本质是一样的

Copy link
Member

@shangzhizhou shangzhizhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Wangzheee Wangzheee merged commit 1b58ce1 into PaddlePaddle:develop Apr 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants