-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Paddle inference] support new quant_model #41049
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
… inference_new_quant
…ddle into inference_new_quant
quant_op->Op()->InputNames()[i], | ||
quant_op->Op()->GetAttr( | ||
"Input_scale_" + | ||
quant_op->Op()->Input(quant_op->Op()->InputNames()[i])[0])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quant_op->Op()->InputNames()[I]
被使用了3次,可以声明个局部变量增强下可读性么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
for (size_t i = 0; i < quant_op->Op()->InputNames().size(); i++) { | ||
if (quant_op->Op()->Input(quant_op->Op()->InputNames()[i]).size() > 0 && | ||
quant_op->Op()->HasAttr( | ||
"Input_scale_" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
最好定一个方法来构造这个attr name, 可以保持SetAttr、GetAttr、HasAttr时的行为一致。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前scale是绑定tensor,中间的pass能拿到tensor的名字,最后一个pass会修改tensor的名字,convert里就无法与scale匹配。目前还没有解决办法。
这是一个中间的变量,为后续的pass提供tensor的scale
|
||
// If inputs'tensors have the inputs_scale, then save it's index in | ||
// input_quant_tensor_index | ||
// OP'Attr hasn't std::vector<std::pair< >>. To do: Support multi-tensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用一个PADDLE_ENFORCE_EQ处理下这种不支持的情况?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass里如果加PADDLE_ENFORCE_EQ报错是会挂掉,希望是return,这个我可以加个warning,如果输入或者输出中的某一个node是 tensor list 就return并报warning
|
||
// If outputs'tensors have the outputs_scale, then save it's index in | ||
// output_quant_tensor_index | ||
// OP'Attr hasn't std::vector<std::pair< >>. To do: Support multi-tensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用一个PADDLE_ENFORCE_EQ处理下这种不支持的情况?
for (auto out_op_node : out_node->outputs) { | ||
for (auto name : out_op_node->Op()->InputNames()) { | ||
for (auto input_name : out_op_node->Op()->Input(name)) { | ||
if (out_op_node->Op()->HasAttr("Input_scale_" + input_name)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,这种拼接key的方式不好读、不好维护。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
绑定tensor的scale,目前还没想到有其他的方法设置attribute
} | ||
quant_op->Op()->SetAttr("support_int8", inscale_flag && outscale_flag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果"outscale_flag"不是quant_op ”support_int8“的必要条件,这个pass还有存在的必要么?
我理解这个pass做了两件事情:
- 将当前quant_op的attr中的input scale换了个key存储;这在delete_quant_dequant_linear_pass中也可以做吧?
- 寻找quant_op后继op的input scale,并设置到quant_op的output scale; 这个output scale最终会被设置为quant_op的output的dynamic range, 这个和设置为后继OP的input的dynamic range效果是一样的吧?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是为了推理的混合精度设计的,以OP为粒度进行推理量化,在最后的 op_convert 中以OP为单位设置量化信息
@@ -86,6 +86,8 @@ pass_library(quant_conv2d_dequant_fuse_pass inference) | |||
pass_library(shuffle_channel_detect_pass inference) | |||
pass_library(delete_quant_dequant_op_pass inference) | |||
pass_library(delete_quant_dequant_filter_op_pass inference) | |||
pass_library(delete_weight_quant_dequant_linear_op_pass inference) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weight只有dequant, 没有quant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的,我改个名字
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
预期是只添加delete_quant_dequant_linear_op_pass和delete_weight_dequant_linear_op_pass两个pass就行,对齐原来scale的设置方式,这样就不用修改其它fusion pass和converter了。
如果之前的fusion pass、converter有什么不合理的,可以另起PR修改,不建议将风险集中到当前一个PR,会增加测试压力和出现不兼容问题的概率。
/* | ||
if (!IsCompat(subgraph, g)) { | ||
LOG(WARNING) << "delete_quant_dequant_linear_op_pass " | ||
"compat check failed."; | ||
return; | ||
} | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove unused lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个要等训练OP合入才能打开
paddle::platform::is_cpu_place(input_scale_tensor.place()), true, | ||
platform::errors::InvalidArgument( | ||
"Input scale tensor's place should be CPU.")); | ||
const float* input_scale_data = input_scale_tensor.data<float>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我看 @yghstill 还支持了FP64的数值类型?
float input_scale = input_scale_data[0] / range; | ||
|
||
auto* any_op2_desc = any_op2->Op(); | ||
any_op2_desc->SetAttr("Input_scale_" + quantize_linear_op_x->Var()->Name(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,可以优化下这种拼接key的方式。
} | ||
quant_op->Op()->SetAttr("support_int8", inscale_flag && outscale_flag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果"outscale_flag"不是quant_op ”support_int8“的必要条件,这个pass还有存在的必要么?
我理解这个pass做了两件事情:
- 将当前quant_op的attr中的input scale换了个key存储;这在delete_quant_dequant_linear_pass中也可以做吧?
- 寻找quant_op后继op的input scale,并设置到quant_op的output scale; 这个output scale最终会被设置为quant_op的output的dynamic range, 这个和设置为后继OP的input的dynamic range效果是一样的吧?
if (quantized_op_type == "mul" || quantized_op_type == "matmul" || | ||
quantized_op_type == "matmul_v2") { | ||
op_desc->SetAttr("X_scale", input_scale); | ||
} else { | ||
op_desc->SetAttr("Input_scale", input_scale); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不能把新格式和旧的Attr的设置方式对齐么,这样就不用修改converter了。
原来的设置方式是有些不合理,但是可以另起PR修改,尽量避免当前PR修改过多的模块。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前无法对齐,之前的Attr只有一个标量,无法表示多输入多输出的情况,更不能表示其中一个输入(输出)是tensor list的情况,理论上需要一个二维数组scale 才能完整绑定tensor,但是框架只支持一维数组,现在还没有想到解决办法
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是Apache 2.0, 不是paddle 2.0,不用更新版本
// input_quant_tensor_index | ||
// OP'Attr hasn't std::vector<std::pair< >>. To do: Support multi-tensor | ||
// scale for one input | ||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没用代码可以删掉
for (size_t i = 0; i< quant_op->Op()->OutputNames().size() ; i++){ | ||
for (size_t j =0; j< | ||
quant_op->Op()->Output(quant_op->Op()->OutputNames()[i]).size();j++){ | ||
if(input_name == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上没用可以删除
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version 2.3 -》 2.0, LICENSE-2.3 -》 2.0,请排查下这个pr的所有copyright声明
@@ -0,0 +1,148 @@ | |||
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. | |||
// | |||
// Licensed under the Apache License, Version 2.3 (the "License"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Licensed under the Apache License, Version 2.3 (the "License"); | |
// Licensed under the Apache License, Version 2.0(the "License"); |
// scale for one input | ||
/* | ||
for (size_t i = 0; i< quant_op->Op()->InputNames().size() ; i++){ | ||
for (size_t j =0; j< | ||
quant_op->Op()->Input(quant_op->Op()->InputNames()[i]).size();j++){ | ||
if(quant_op->Op()->HasAttr("Input_scale_"+quant_op->Op()->Input(quant_op->Op()->InputNames()[i])[j])){ | ||
inscale_flag = true; | ||
input_quant_tensor_index.push_back(std::make_pair(i,j)); | ||
inputs_scale.push_back(BOOST_GET_CONST(float, | ||
quant_op->Op()->GetAttr("Input_scale_"+quant_op->Op()->Input(quant_op->Op()->InputNames()[i])[j]))); | ||
} | ||
} | ||
} | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是不支持的情况的代码被注释掉了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯嗯,我直接删掉吧
mul0_op_desc->GetAttr("Input_scale")); | ||
} | ||
auto* add0_op_desc = eltadd0->Op(); | ||
auto* add1_op_desc = eltadd1->Op(); | ||
auto* add2_op_desc = eltadd2->Op(); | ||
if (add0_op_desc->HasAttr("out_threshold")) { | ||
auto out_scale0 = | ||
BOOST_GET_CONST(float, add0_op_desc->GetAttr("out_threshold")); | ||
auto out_scale1 = | ||
BOOST_GET_CONST(float, add1_op_desc->GetAttr("out_threshold")); | ||
auto out_scale2 = | ||
BOOST_GET_CONST(float, add2_op_desc->GetAttr("out_threshold")); | ||
auto out_scale_max = std::max(out_scale0, out_scale1); | ||
out_scale_max = std::max(out_scale_max, out_scale2); | ||
multihead_op_desc.SetAttr("fc_out_threshold", out_scale_max); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是不需要weight scale了吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的
@@ -356,7 +355,7 @@ void QuantDequantFusePass::DeleteQuant(ir::Graph* graph, Scope* scope, | |||
"Input scale tensor's place should be CPU.")); | |||
const float* input_scale_data = input_scale_tensor.data<float>(); | |||
float in_scale = input_scale_data[0]; | |||
float scale_value = in_scale / range; | |||
float scale_value = in_scale; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个去掉 / range还兼容之前的量化版本吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
兼容,对应使用这个变量的地方也改了
@@ -148,14 +146,18 @@ class FcOpConverter : public OpConverter { | |||
auto regist_fc = [&](nvinfer1::ITensor* inputs, int n_output, | |||
TensorRTEngine::Weight& weight, | |||
TensorRTEngine::Weight& bias) { | |||
if (enable_int8) { | |||
if (enable_int8 || support_int8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个不能统一吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
要兼容两种量化格式,目前还删不掉
@@ -93,7 +91,7 @@ class MultiheadMatMulOpConverter : public OpConverter { | |||
static_cast<int32_t>(bias_t->numel())}; | |||
if (engine_->with_interleaved()) { | |||
VLOG(4) << "fused multihead_matmul op: use_oss and with_interleaved"; | |||
if (!enable_int8) { | |||
if (!op_desc.HasAttr("Input_scale")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么要换成这个?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种多个op融合的大OP,这个flag有点不合适,还没想到换成啥,先直接用更直接的属性判断,两者本质是一样的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
Others
Describe
Paddle Inference支持新的量化模型,涉及全套的推理量化的pass、量化OP,量化信息在pass中处理,在 op_converter.h 和所有的 convert op 中使用