[Paddle inference] support new quant_model #41049

Wangzheee · 2022-03-28T14:35:44Z

PR types

Others

PR changes

Others

Describe

Paddle Inference支持新的量化模型，涉及全套的推理量化的pass、量化OP，量化信息在pass中处理，在 op_converter.h 和所有的 convert op 中使用

paddle-bot-old · 2022-03-28T14:37:14Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… inference_new_quant

…ddle into inference_new_quant

wanghaoshuang · 2022-03-31T00:41:29Z

paddle/fluid/framework/ir/add_support_int8_pass.cc

+            quant_op->Op()->InputNames()[i],
+            quant_op->Op()->GetAttr(
+                "Input_scale_" +
+                quant_op->Op()->Input(quant_op->Op()->InputNames()[i])[0]));


quant_op->Op()->InputNames()[I] 被使用了3次，可以声明个局部变量增强下可读性么？

wanghaoshuang · 2022-03-31T00:43:30Z

paddle/fluid/framework/ir/add_support_int8_pass.cc

+    for (size_t i = 0; i < quant_op->Op()->InputNames().size(); i++) {
+      if (quant_op->Op()->Input(quant_op->Op()->InputNames()[i]).size() > 0 &&
+          quant_op->Op()->HasAttr(
+              "Input_scale_" +


最好定一个方法来构造这个attr name, 可以保持SetAttr、GetAttr、HasAttr时的行为一致。

目前scale是绑定tensor，中间的pass能拿到tensor的名字，最后一个pass会修改tensor的名字，convert里就无法与scale匹配。目前还没有解决办法。
这是一个中间的变量，为后续的pass提供tensor的scale

wanghaoshuang · 2022-03-31T00:50:44Z

paddle/fluid/framework/ir/add_support_int8_pass.cc

+
+    // If inputs'tensors have the inputs_scale, then save it's index in
+    // input_quant_tensor_index
+    // OP'Attr hasn't std::vector<std::pair< >>. To do: Support multi-tensor


用一个PADDLE_ENFORCE_EQ处理下这种不支持的情况？

pass里如果加PADDLE_ENFORCE_EQ报错是会挂掉，希望是return，这个我可以加个warning，如果输入或者输出中的某一个node是 tensor list 就return并报warning

wanghaoshuang · 2022-03-31T00:51:45Z

paddle/fluid/framework/ir/add_support_int8_pass.cc

+
+    // If outputs'tensors have the outputs_scale, then save it's index in
+    // output_quant_tensor_index
+    // OP'Attr hasn't std::vector<std::pair< >>. To do: Support multi-tensor


用一个PADDLE_ENFORCE_EQ处理下这种不支持的情况？

wanghaoshuang · 2022-03-31T00:55:12Z

paddle/fluid/framework/ir/add_support_int8_pass.cc

+      for (auto out_op_node : out_node->outputs) {
+        for (auto name : out_op_node->Op()->InputNames()) {
+          for (auto input_name : out_op_node->Op()->Input(name)) {
+            if (out_op_node->Op()->HasAttr("Input_scale_" + input_name)) {


同上，这种拼接key的方式不好读、不好维护。

绑定tensor的scale，目前还没想到有其他的方法设置attribute

wanghaoshuang · 2022-03-31T00:56:20Z

paddle/fluid/framework/ir/add_support_int8_pass.cc

    }
+    quant_op->Op()->SetAttr("support_int8", inscale_flag && outscale_flag);


"outscale_flag"不是quant_op ”support_int8“的必要条件。 @yghstill @ceci3

如果"outscale_flag"不是quant_op ”support_int8“的必要条件，这个pass还有存在的必要么？
我理解这个pass做了两件事情：

将当前quant_op的attr中的input scale换了个key存储；这在delete_quant_dequant_linear_pass中也可以做吧？

寻找quant_op后继op的input scale，并设置到quant_op的output scale; 这个output scale最终会被设置为quant_op的output的dynamic range, 这个和设置为后继OP的input的dynamic range效果是一样的吧？

这个是为了推理的混合精度设计的，以OP为粒度进行推理量化，在最后的 op_convert 中以OP为单位设置量化信息

wanghaoshuang · 2022-03-31T00:57:34Z

paddle/fluid/framework/ir/CMakeLists.txt

@@ -86,6 +86,8 @@ pass_library(quant_conv2d_dequant_fuse_pass inference)
 pass_library(shuffle_channel_detect_pass inference)
 pass_library(delete_quant_dequant_op_pass inference)
 pass_library(delete_quant_dequant_filter_op_pass inference)
+pass_library(delete_weight_quant_dequant_linear_op_pass inference)


weight只有dequant, 没有quant

好的，我改个名字

wanghaoshuang

预期是只添加delete_quant_dequant_linear_op_pass和delete_weight_dequant_linear_op_pass两个pass就行，对齐原来scale的设置方式，这样就不用修改其它fusion pass和converter了。
如果之前的fusion pass、converter有什么不合理的，可以另起PR修改，不建议将风险集中到当前一个PR，会增加测试压力和出现不兼容问题的概率。

wanghaoshuang · 2022-03-31T01:00:53Z

paddle/fluid/framework/ir/delete_quant_dequant_linear_op_pass.cc

+    /*
+    if (!IsCompat(subgraph, g)) {
+      LOG(WARNING) << "delete_quant_dequant_linear_op_pass "
+                      "compat check failed.";
+      return;
+    }
+    */


remove unused lines?

这个要等训练OP合入才能打开

wanghaoshuang · 2022-03-31T01:03:12Z

paddle/fluid/framework/ir/delete_quant_dequant_linear_op_pass.cc

+        paddle::platform::is_cpu_place(input_scale_tensor.place()), true,
+        platform::errors::InvalidArgument(
+            "Input scale tensor's place should be CPU."));
+    const float* input_scale_data = input_scale_tensor.data<float>();


我看 @yghstill 还支持了FP64的数值类型？

wanghaoshuang · 2022-03-31T01:04:23Z

paddle/fluid/framework/ir/delete_quant_dequant_linear_op_pass.cc

+    float input_scale = input_scale_data[0] / range;
+
+    auto* any_op2_desc = any_op2->Op();
+    any_op2_desc->SetAttr("Input_scale_" + quantize_linear_op_x->Var()->Name(),


同上，可以优化下这种拼接key的方式。

wanghaoshuang · 2022-03-31T01:23:23Z

paddle/fluid/framework/ir/add_support_int8_pass.cc

    }
+    quant_op->Op()->SetAttr("support_int8", inscale_flag && outscale_flag);


如果"outscale_flag"不是quant_op ”support_int8“的必要条件，这个pass还有存在的必要么？
我理解这个pass做了两件事情：

将当前quant_op的attr中的input scale换了个key存储；这在delete_quant_dequant_linear_pass中也可以做吧？

寻找quant_op后继op的input scale，并设置到quant_op的output scale; 这个output scale最终会被设置为quant_op的output的dynamic range, 这个和设置为后继OP的input的dynamic range效果是一样的吧？

wanghaoshuang · 2022-03-31T01:27:15Z

paddle/fluid/framework/ir/delete_quant_dequant_op_pass.cc

-      if (quantized_op_type == "mul" || quantized_op_type == "matmul" ||
-          quantized_op_type == "matmul_v2") {
-        op_desc->SetAttr("X_scale", input_scale);
-      } else {
-        op_desc->SetAttr("Input_scale", input_scale);
-      }


不能把新格式和旧的Attr的设置方式对齐么，这样就不用修改converter了。
原来的设置方式是有些不合理，但是可以另起PR修改，尽量避免当前PR修改过多的模块。

目前无法对齐，之前的Attr只有一个标量，无法表示多输入多输出的情况，更不能表示其中一个输入（输出）是tensor list的情况，理论上需要一个二维数组scale 才能完整绑定tensor，但是框架只支持一维数组，现在还没有想到解决办法

shangzhizhou · 2022-04-01T03:07:36Z

paddle/fluid/framework/ir/add_support_int8_pass.cc

 //
-// Licensed under the Apache License, Version 2.0 (the "License");


这里是Apache 2.0, 不是paddle 2.0，不用更新版本

shangzhizhou · 2022-04-01T03:09:54Z

paddle/fluid/framework/ir/add_support_int8_pass.cc

+    // input_quant_tensor_index
+    // OP'Attr hasn't std::vector<std::pair< >>. To do: Support multi-tensor
+    // scale for one input
+    /*


没用代码可以删掉

shangzhizhou · 2022-04-01T03:11:42Z

paddle/fluid/framework/ir/add_support_int8_pass.cc

+              for (size_t i = 0; i< quant_op->Op()->OutputNames().size() ; i++){
+                for (size_t j =0; j<
+    quant_op->Op()->Output(quant_op->Op()->OutputNames()[i]).size();j++){
+                  if(input_name ==


同上没用可以删除

shangzhizhou · 2022-04-01T03:21:44Z

paddle/fluid/framework/ir/delete_quant_dequant_linear_op_pass.cc

+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.3


Version 2.3 -》 2.0， LICENSE-2.3 -》 2.0，请排查下这个pr的所有copyright声明

shangzhizhou · 2022-04-01T03:22:10Z

paddle/fluid/framework/ir/delete_quant_dequant_linear_op_pass.cc

@@ -0,0 +1,148 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.3 (the "License");


Suggested change

// Licensed under the Apache License, Version 2.3 (the "License");

// Licensed under the Apache License, Version 2.0(the "License");

ceci3 · 2022-03-31T05:51:59Z

paddle/fluid/framework/ir/add_support_int8_pass.cc

+    // scale for one input
+    /*
+    for (size_t i = 0; i< quant_op->Op()->InputNames().size() ; i++){
+      for (size_t j =0; j<
+    quant_op->Op()->Input(quant_op->Op()->InputNames()[i]).size();j++){
+        if(quant_op->Op()->HasAttr("Input_scale_"+quant_op->Op()->Input(quant_op->Op()->InputNames()[i])[j])){
+          inscale_flag = true;
+          input_quant_tensor_index.push_back(std::make_pair(i,j));
+          inputs_scale.push_back(BOOST_GET_CONST(float,
+    quant_op->Op()->GetAttr("Input_scale_"+quant_op->Op()->Input(quant_op->Op()->InputNames()[i])[j])));
+        }
+      }
+    }
+    */


这里是不支持的情况的代码被注释掉了？

嗯嗯，我直接删掉吧

ceci3 · 2022-04-01T05:19:56Z

paddle/fluid/framework/ir/multihead_matmul_fuse_pass.cc

+                                mul0_op_desc->GetAttr("Input_scale"));
+    }
+    auto* add0_op_desc = eltadd0->Op();
+    auto* add1_op_desc = eltadd1->Op();
+    auto* add2_op_desc = eltadd2->Op();
+    if (add0_op_desc->HasAttr("out_threshold")) {
+      auto out_scale0 =
+          BOOST_GET_CONST(float, add0_op_desc->GetAttr("out_threshold"));
+      auto out_scale1 =
+          BOOST_GET_CONST(float, add1_op_desc->GetAttr("out_threshold"));
+      auto out_scale2 =
+          BOOST_GET_CONST(float, add2_op_desc->GetAttr("out_threshold"));
+      auto out_scale_max = std::max(out_scale0, out_scale1);
+      out_scale_max = std::max(out_scale_max, out_scale2);
+      multihead_op_desc.SetAttr("fc_out_threshold", out_scale_max);


这里是不需要weight scale了吗

ceci3 · 2022-04-01T05:21:17Z

paddle/fluid/framework/ir/quant_conv2d_dequant_fuse_pass.cc

@@ -356,7 +355,7 @@ void QuantDequantFusePass::DeleteQuant(ir::Graph* graph, Scope* scope,
            "Input scale tensor's place should be CPU."));
    const float* input_scale_data = input_scale_tensor.data<float>();
    float in_scale = input_scale_data[0];
-    float scale_value = in_scale / range;
+    float scale_value = in_scale;


这个去掉 / range还兼容之前的量化版本吗？

兼容，对应使用这个变量的地方也改了

ceci3 · 2022-04-01T06:01:28Z

paddle/fluid/inference/tensorrt/convert/fc_op.cc

@@ -148,14 +146,18 @@ class FcOpConverter : public OpConverter {
    auto regist_fc = [&](nvinfer1::ITensor* inputs, int n_output,
                         TensorRTEngine::Weight& weight,
                         TensorRTEngine::Weight& bias) {
-      if (enable_int8) {
+      if (enable_int8 || support_int8) {


这两个不能统一吗？

要兼容两种量化格式，目前还删不掉

ceci3 · 2022-04-01T06:02:00Z

paddle/fluid/inference/tensorrt/convert/multihead_matmul_op.cc

@@ -93,7 +91,7 @@ class MultiheadMatMulOpConverter : public OpConverter {
                               static_cast<int32_t>(bias_t->numel())};
        if (engine_->with_interleaved()) {
          VLOG(4) << "fused multihead_matmul op: use_oss and with_interleaved";
-          if (!enable_int8) {
+          if (!op_desc.HasAttr("Input_scale")) {


为什么要换成这个？

这种多个op融合的大OP，这个flag有点不合适，还没想到换成啥，先直接用更直接的属性判断，两者本质是一样的

shangzhizhou

LGTM

paddle inference support new quant_model

bf91ae9

wanghaoshuang requested review from ceci3 and yghstill March 28, 2022 14:37

wanghaoshuang requested review from qingqing01 and wanghaoshuang March 28, 2022 14:37

Wangzheee added 6 commits March 29, 2022 06:39

paddle inference support new quant_model

2049437

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

02aca60

… inference_new_quant

paddle inference support new quant_model

aa817ec

paddle inference support new quant_model

80ea1fe

paddle inference support new quant_model

019c29c

Merge branch 'inference_new_quant' of https://github.com/Wangzheee/Pa…

0523234

…ddle into inference_new_quant

wanghaoshuang reviewed Mar 31, 2022

View reviewed changes

wanghaoshuang requested a review from shangzhizhou April 1, 2022 02:39

shangzhizhou reviewed Apr 1, 2022

View reviewed changes

ceci3 reviewed Apr 1, 2022

View reviewed changes

support new quant model

f45b1dc

shangzhizhou approved these changes Apr 2, 2022

View reviewed changes

wanghaoshuang approved these changes Apr 2, 2022

View reviewed changes

ceci3 approved these changes Apr 2, 2022

View reviewed changes

raindrops2sea approved these changes Apr 2, 2022

View reviewed changes

Wangzheee merged commit 1b58ce1 into PaddlePaddle:develop Apr 2, 2022

		}
		quant_op->Op()->SetAttr("support_int8", inscale_flag && outscale_flag);

		//
		// Licensed under the Apache License, Version 2.0 (the "License");

[Paddle inference] support new quant_model #41049

[Paddle inference] support new quant_model #41049

Conversation

Wangzheee commented Mar 28, 2022

PR types

PR changes

Describe

paddle-bot-old bot commented Mar 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wanghaoshuang left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shangzhizhou left a comment

Choose a reason for hiding this comment

wanghaoshuang left a comment •

edited

Loading