Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][Quantization] Designing and lowering of quantized ops #3512

Closed
wants to merge 9 commits into from

Conversation

shoubhik
Copy link
Contributor

@shoubhik shoubhik commented Jul 8, 2019

[Relay] [Quantization] WIP - This is the continuation of pull request #3367

In this PR I want to discuss the design and implementation of the

  • Quantize op -> FP32 to i8/u8
  • Dequantize Op -> i8/u8 -> fp32

I have added test cases to verify the correctness of the ops.

…apache#3367](apache#3367)

In this PR I want to discuss the design and implementation of the
- Quantize op -> FP32 to i8/u8
- Dequantize Op -> i8/u8 -> fp32

I have added test cases to verify the correctness of the ops.
1. Correcting docs
2. Reordering Clip and Cast in the dequantize op for stability.
@shoubhik shoubhik changed the title Quantize dequantize op [RFC][Quantization] Designing and lowering of quantized ops Jul 8, 2019
@shoubhik
Copy link
Contributor Author

shoubhik commented Jul 8, 2019

@zhiics
Copy link
Member

zhiics commented Jul 8, 2019

Is this ready for review? Have we been converged on the design in the quantization RFC?

@anijain2305
Copy link
Contributor

We have made good progress on the Quantization RFC, achieving clarity and convergence on many points.
For this PR specifically, @tqchen and @FrozenGene, can you please comment if this looks in line with our quantization RFC.

const auto scale = MakeConstantScalar(Float(32), attrs->output_scale);
const int32_t min_val = get_qmin(out_dtype);
const int32_t max_val = get_qmax(out_dtype);
auto scale_data = Cast(Round(Divide(data, scale)), Int(32));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the RFC, this should be round_away_from_zero, i.e. std::round in C++. How about this Round here? I don't see definition of this Round.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Relay Round operator which translates to LLVM round intrinsic

https://llvm.org/docs/LangRef.html#llvm-round-intrinsic

The comment on the above link says that it has same behavior as LIBM round function which is defined here
https://sourceware.org/newlib/libm.html#lround

This is round-away-from-zero.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. That's make sense.

@shoubhik
Copy link
Contributor Author

@FrozenGene and @tqchen, any other major comments for the PR?

@tqchen
Copy link
Member

tqchen commented Jul 12, 2019

Mainly organizational issues, please make things consistent with what was discussed in #3531

1. Correcting the file paths as suggested in the reviews.
@shoubhik
Copy link
Contributor Author

Mainly organizational issues, please make things consistent with what was discussed in #3531

I have addressed the namespace issues and have followed the same convetion as #3531 in the new commit.

python/tvm/relay/qnn/ir_pass.py Outdated Show resolved Hide resolved
python/tvm/relay/qnn/op/__init__.py Outdated Show resolved Hide resolved
python/tvm/relay/qnn/__init__.py Outdated Show resolved Hide resolved
python/tvm/relay/qnn/_qnn.py Outdated Show resolved Hide resolved
python/tvm/relay/qnn/op/_make.py Outdated Show resolved Hide resolved
python/tvm/relay/qnn/op/qnn.py Outdated Show resolved Hide resolved
@shoubhik
Copy link
Contributor Author

@liangfu made the changes you suggested.

Copy link
Member

@liangfu liangfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@zhenhuaw-me zhenhuaw-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have not check lowering code. Basically two things:

  1. Is Quantize and Dequantize handle data type of 16 bits, regarding PR [QNN] Requantize operator #3531 ?
  2. This PR review may need to suspend until PR [QNN] Requantize operator #3531 were merged, as there is so many rebase task to do.

DataType out_dtype;

TVM_DECLARE_ATTRS(QuantizeAttrs, "relay.attrs.QuantizeAttrs") {
TVM_ATTR_FIELD(out_dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen that PR #3531 accepts quantized tensor data type of 8/16 bit, are we going to align?

namespace tvm {
namespace relay {

inline bool IsInt8(const DataType& dtype) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmmmm, do we really need such utils?

return dtype == Float(32);
}

inline bool IsQuantizedType(const DataType& dtype) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflict naming against PR #3531 (also in many other files). Should we pending this review until #3531 is merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i am waiting for #3531 to get merged. it seems to be in flux for now. once it is merged i will make the changes in this one.

case QuantizeOpType::Quantize:
return IsFloat32(in_dtype);
case QuantizeOpType ::Dequantize:
return IsQuantizedType(in_dtype);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type check is against your api definition.

const DataType& in_dtype) {
switch (op_type) {
case QuantizeOpType::Quantize:
return IsQuantizedType(in_dtype);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type check is against your api definition. Alos @anijain2305 , do we need to align?

I think, if someone defines an OpType and conversions, it his responsible to include that code in one unified PR. Otherwise, don't include the OpType definition or conversions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree, lets finish #3531 and then i will update this one.

}
}

inline const int32_t GetQmin(const DataType& dtype) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation seems against PR #3531, of which the code is more simple.

} else if (IsUint32(dtype)) {
return std::numeric_limits<uint32_t>::min();
}
LOG(FATAL) << "Type not supported\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe include in an else section.

return -1;
}

inline const int32_t GetQmax(const DataType& dtype) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as GetQmin().

@zhenhuaw-me
Copy link
Contributor

Regarding 16/32 bits quantization, I have a discussion in RFC #3591 .

@shoubhik
Copy link
Contributor Author

There are quite a lot of changes here that are depndent on #3531 . I am closing the PR for now. I will reopen this once #3531 is pushed.

@shoubhik shoubhik closed this Jul 25, 2019
@anijain2305 anijain2305 deleted the quantize-dequantize branch November 13, 2019 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants