Quantize Weight for Gemm/Conv on Quantized Model #22969

centwang · 2024-11-28T10:20:40Z

Some quantized models have QDQ around Conv/Gemm but the weight and/or bias are not quantized. This PR adds WeightBiasQuantization optimizer to quantize float weight and/or bias to INT8 and INT32 tensors respectively. We only do this for weight and/or bias initializer so that ConstantFolding will fold the sub-graph to real quantized initializers during the graph optimization next round.

skottmckay · 2024-12-03T23:48:32Z

onnxruntime/core/optimizer/qdq_transformer/weight_bias_quantization.h

+ *   For weight, it's quantized to symmetric per-tensor INT8 tensor.
+ *   For bias, it's quantized to a INT32 tensor with scale = scale_input_0 * scale_input_1 and zero_point = 0.


Could we say we insert a Q and DQ after the weight to allow it to potentially be quantized? As that's not guaranteed to happen I assume (e.g. EP wants to use full precision) it may be slightly clearer to the reader what is happening.

skottmckay · 2024-12-03T23:51:03Z

onnxruntime/core/optimizer/qdq_transformer/weight_bias_quantization.cc

+      if (dq_attrs.find("axis") != dq_attrs.end()) {
+        axis = dq_attrs.at("axis").i();
+      }


nit: can avoid doing 2 lookups

Suggested change

if (dq_attrs.find("axis") != dq_attrs.end()) {

axis = dq_attrs.at("axis").i();

}

if (auto axis_iter = dq_attrs.find("axis"); axis_iter != dq_attrs.end()) {

axis = axis_iter->second.i();

}

skottmckay

quantize weight

9389bbe

centwang requested review from skottmckay, adrianlizarraga and jywu-msft November 28, 2024 10:20

centwang marked this pull request as ready for review November 28, 2024 10:20

centwang added 2 commits November 29, 2024 10:44

adjust ut scale and zp

aa00373

adjust ut data

3c170c3

skottmckay reviewed Dec 4, 2024

View reviewed changes

centwang added 2 commits December 4, 2024 11:31

resolve comments

d8d1156

fix warn

e5b9b40

skottmckay approved these changes Dec 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantize Weight for Gemm/Conv on Quantized Model #22969

Quantize Weight for Gemm/Conv on Quantized Model #22969

centwang commented Nov 28, 2024

skottmckay Dec 3, 2024

skottmckay Dec 3, 2024

skottmckay left a comment

		* For weight, it's quantized to symmetric per-tensor INT8 tensor.
		* For bias, it's quantized to a INT32 tensor with scale = scale_input_0 * scale_input_1 and zero_point = 0.

Quantize Weight for Gemm/Conv on Quantized Model #22969

Are you sure you want to change the base?

Quantize Weight for Gemm/Conv on Quantized Model #22969

Conversation

centwang commented Nov 28, 2024

skottmckay Dec 3, 2024

Choose a reason for hiding this comment

skottmckay Dec 3, 2024

Choose a reason for hiding this comment

skottmckay left a comment

Choose a reason for hiding this comment