-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{QNN] Making scale/zero_points as expr instead of attrs. #4611
Conversation
Please review @zhiics @FrozenGene @jackwish @shoubhik @u99127 You might be using QNN already. It might affect you if you are reading QNN graphs directly. Please review! @vinx13 FixedPointMultiply is shared between QNN and Automatic Quantization. However, I deliberately kept FixedPointMultiply unchanged, so this PR does not affect automatic quantization. Please review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for the impressive work @anijain2305 , that must be a lot of hard work :)
python/tvm/relay/frontend/tflite.py
Outdated
@@ -224,9 +226,21 @@ def get_tensor_type_str(self, tensor_type): | |||
raise NotImplementedError("Tensor type {} is currently not supported" | |||
.format(str(tensor_type))) | |||
|
|||
def is_scalar(self, expr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is anyone using this?
import tvm | ||
from tvm import relay | ||
from .. import op as reg | ||
from ... import expr as _expr | ||
|
||
def get_scalar_from_constant(expr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated with the one in frontend/common.py
?
src/relay/qnn/op/concatenate.cc
Outdated
RELAY_ERROR("qnn concatenate requires a tuple of scales as the second argument, found " | ||
<< PrettyPrint(types[1]))); | ||
} | ||
for (auto input_scale : input_scales_tuple->fields) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be auto&
?
src/relay/qnn/op/concatenate.cc
Outdated
RELAY_ERROR("qnn concatenate requires a tuple of zero_points as the third argument, found " | ||
<< PrettyPrint(types[2]))); | ||
} | ||
for (auto input_zero_point : input_zero_points_tuple->fields) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be auto&
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto&
maybe better. We don't modify the value and just do check.
6cbf86f
to
9b5ddf2
Compare
On behalf of @u99127, lgtm. |
Thanks @anijain2305 @FrozenGene @Leo-arm @jackwish this is merged |
Currently QNN dialect only deals with uniform quantization, which means each tensor has just one scale and zero point. Because of this restriction, QNN design had scale and zero points as op attributes. However, as we move towards channel quantization, scale will become a vector (and behave similar to something like bias for bias_add in terms of ops inputs).
Before moving to channel quantization, this PR makes the necessary changes to make the scale and zero points as input expr to operators (instead of making them op attrs). So, this PR does not bring any functional/performance change to QNN graphs. The new type checks still check that scale and zero points must be const scalar values. Once this PR is merged, and we start moving towards channel-wise quantization, we can start relaxing the checks and modifying the lowering.