-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC][Quantization] Designing and lowering of quantized ops #3512
Conversation
…apache#3367](apache#3367) In this PR I want to discuss the design and implementation of the - Quantize op -> FP32 to i8/u8 - Dequantize Op -> i8/u8 -> fp32 I have added test cases to verify the correctness of the ops.
1. Correcting docs 2. Reordering Clip and Cast in the dequantize op for stability.
Is this ready for review? Have we been converged on the design in the quantization RFC? |
We have made good progress on the Quantization RFC, achieving clarity and convergence on many points. |
src/relay/pass/quantize_rewrite.cc
Outdated
const auto scale = MakeConstantScalar(Float(32), attrs->output_scale); | ||
const int32_t min_val = get_qmin(out_dtype); | ||
const int32_t max_val = get_qmax(out_dtype); | ||
auto scale_data = Cast(Round(Divide(data, scale)), Int(32)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the RFC, this should be round_away_from_zero, i.e. std::round in C++. How about this Round here? I don't see definition of this Round.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Relay Round operator which translates to LLVM round intrinsic
https://llvm.org/docs/LangRef.html#llvm-round-intrinsic
The comment on the above link says that it has same behavior as LIBM round function which is defined here
https://sourceware.org/newlib/libm.html#lround
This is round-away-from-zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. That's make sense.
@FrozenGene and @tqchen, any other major comments for the PR? |
Mainly organizational issues, please make things consistent with what was discussed in #3531 |
1. Correcting the file paths as suggested in the reviews.
2. Fixing lint issues.
@liangfu made the changes you suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have not check lowering code. Basically two things:
- Is
Quantize
andDequantize
handle data type of 16 bits, regarding PR [QNN] Requantize operator #3531 ? - This PR review may need to suspend until PR [QNN] Requantize operator #3531 were merged, as there is so many rebase task to do.
DataType out_dtype; | ||
|
||
TVM_DECLARE_ATTRS(QuantizeAttrs, "relay.attrs.QuantizeAttrs") { | ||
TVM_ATTR_FIELD(out_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen that PR #3531 accepts quantized tensor data type of 8/16 bit, are we going to align?
namespace tvm { | ||
namespace relay { | ||
|
||
inline bool IsInt8(const DataType& dtype) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmmmmm, do we really need such utils?
return dtype == Float(32); | ||
} | ||
|
||
inline bool IsQuantizedType(const DataType& dtype) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes i am waiting for #3531 to get merged. it seems to be in flux for now. once it is merged i will make the changes in this one.
case QuantizeOpType::Quantize: | ||
return IsFloat32(in_dtype); | ||
case QuantizeOpType ::Dequantize: | ||
return IsQuantizedType(in_dtype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type check is against your api definition.
const DataType& in_dtype) { | ||
switch (op_type) { | ||
case QuantizeOpType::Quantize: | ||
return IsQuantizedType(in_dtype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type check is against your api definition. Alos @anijain2305 , do we need to align?
I think, if someone defines an OpType and conversions, it his responsible to include that code in one unified PR. Otherwise, don't include the OpType definition or conversions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree, lets finish #3531 and then i will update this one.
} | ||
} | ||
|
||
inline const int32_t GetQmin(const DataType& dtype) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation seems against PR #3531, of which the code is more simple.
} else if (IsUint32(dtype)) { | ||
return std::numeric_limits<uint32_t>::min(); | ||
} | ||
LOG(FATAL) << "Type not supported\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe include in an else
section.
return -1; | ||
} | ||
|
||
inline const int32_t GetQmax(const DataType& dtype) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as GetQmin()
.
Regarding 16/32 bits quantization, I have a discussion in RFC #3591 . |
[Relay] [Quantization] WIP - This is the continuation of pull request #3367
In this PR I want to discuss the design and implementation of the
I have added test cases to verify the correctness of the ops.