-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TFLite] QNN support for TFLite 2.1.0 quantized models #5848
Conversation
0bbef96
to
bd1cc7c
Compare
Opening the PR for review. @siju-samuel @masahi @u99127 @tqchen The CI fails because of a missing package - tensorflow_hub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very quick review before I disappear for the day.
I think this is on the right track but needs some cleanups and some clarifications.
Dmitriy @d-smirnov, could you also casting your eye over this since you've been looking at this internally ?
python/tvm/relay/frontend/tflite.py
Outdated
scale = tflite_scale | ||
# Ensure that all zero points are identical | ||
zero_point = tflite_zero_point | ||
assert all(x == zero_point[0] for x in zero_point) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor Nit : Can we use an error here instead of an assert to show us clearly the change that has happened ? It also means we can provide some sensible diagnostic ?
python/tvm/relay/frontend/tflite.py
Outdated
# Params might be per-tensor or per-axis quantized. For per-tensor, scale and zero | ||
# points are scalar. For per-axis, scale and zero points are tensors. But as per | ||
# TFLite quantization spec, the restrictions on ops suggest that for per-axis, even | ||
# if zero point is a tensor - all the zero points are identical. More infomration | ||
# here - https://www.tensorflow.org/lite/performance/quantization_spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, we are interpreting this from the fact that Conv2d and Depthwise_conv2d have a zero_point of 0 listed in their restriction even though they have per-axis quantization.
I would make the comment more explicit .
For per-axis or per-channel quantization the scale and zero points for the weights are tensors (?)
input_tensors = self.get_input_tensors(op) | ||
assert len(input_tensors) == 1, "input tensors length should be 1" | ||
input_tensor = input_tensors[0] | ||
input_tensor_type_str = self.get_tensor_type_str(input_tensor.tensor.Type()) | ||
in_expr = self.get_expr(input_tensor.tensor_idx) | ||
|
||
output_tensors = self.get_output_tensors(op) | ||
assert len(output_tensors) == 1, "output tensors length should be 1" | ||
output_tensor = output_tensors[0] | ||
output_tensor_type_str = self.get_tensor_type_str(output_tensor.tensor.Type()) | ||
|
||
# The output must be quantized | ||
assert output_tensor.qnn_params | ||
# Quantize the input | ||
out = self.quantize(in_expr, output_tensor) | ||
|
||
# TFLite Quantize op can also act as Requantize op | ||
if input_tensor_type_str == "float32": | ||
out = self.quantize(in_expr, output_tensor) | ||
else: | ||
out = _qnn.op.requantize(in_expr, | ||
input_scale=input_tensor.qnn_params['scale'], | ||
input_zero_point=input_tensor.qnn_params['zero_point'], | ||
output_scale=output_tensor.qnn_params['scale'], | ||
output_zero_point=output_tensor.qnn_params['zero_point'], | ||
out_dtype=output_tensor_type_str) | ||
return out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This to me looks like it can go in by it's own right as a separate PR but this needs a unit test change in tflite/test_forward.py .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I will add a test case in this PR. This will enable us to keep those 5 end to end tests as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python/tvm/relay/frontend/tflite.py
Outdated
shape = tensor_wrapper.tensor.ShapeAsNumpy() | ||
|
||
# Set shape to 1 if the data is a scalar type | ||
if data.shape == (1,) and isinstance(shape, int) and shape == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm scratching my head at this condition with shape. Can you elaborate more as to why we need it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the comments. Can you take a look now? This is sort of a corner case when the TFLite buffer is a scalar.
Thanks @u99127 for the review. All make sense. I will add clarifications. |
5aa8eac
to
05fad24
Compare
@u99127 @siju-samuel CI passes now. Can you please take another look? There were more changes, mostly to add unit level tests. |
python/tvm/relay/frontend/tflite.py
Outdated
zero_point=zero_point_val, | ||
dtype=output_tensor_type_str) | ||
else: | ||
out = _op.clip(in_expr, a_min=0, a_max=6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be relu, not clip
# Define a dummy model | ||
# Keras model to force TFLite converter to insert 2 TFLite quantize ops. | ||
# First TFLite quantize op converts float32 tensor to int8 tensor - Qnn quantize. | ||
# Second TLite quantize op converts int8 tensor to int8 tensor - Qnn requantize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLite -> TFLite
LGTM, Any more changes pending? |
@u99127 ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay - it's been a bit busy.
python/tvm/relay/frontend/tflite.py
Outdated
if data.size == 1 and isinstance(shape, int) and shape == 0: | ||
shape = (1,) | ||
|
||
if tensor_wrapper.tensor.Type() == TensorType.INT8: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit and this should really be credited to Dmitriy Smirnov. https://github.com/d-smirnov
the condition here could well be pulled out into a helper function that has a dictionary to help us map from TensorType to numpy type ?
Would make the code much cleaner and reduce duplication.
i.e. something like
def get_tensor_type_as_numpy(self, tensor_wrapper):
"""Returns np.dtype out of TensorType"""
"""Returns np.dtype out of TensorType"""
assert isinstance(tensor_wrapper, TensorWrapper)
try:
from tflite.TensorType import TensorType
return { TensorType.UINT8: np.uint8,
TensorType.INT8: np.int8,
TensorType.FLOAT32: np.float32,
TensorType.INT32: np.int32,
TensorType.INT64: np.int64,
TensorType.BOOL: np.bool_ }[ tensor_wrapper.tensor.Type() ]
except ImportError:
raise ImportError("The tflite package must be installed")
except KeyError:
raise NotImplementedError("Tensor type '{}' currently not supported"
.format(tensor_wrapper.tensor.Type()))
def get_tensor_value(self, tensor_wrapper):
"""Get tensor buffer value from given tensor wrapper"""
assert isinstance(tensor_wrapper, TensorWrapper)
value_type = self.get_tensor_type_as_numpy( tensor_wrapper )
return np.frombuffer( tensor_wrapper.buffer.DataAsNumpy(), dtype=value_type ).reshape(
tensor_wrapper.tensor.ShapeAsNumpy() \
if 0 != tensor_wrapper.tensor.ShapeLength() \
else [] )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider following change as well:
def has_same_qnn_params(self, lhs_tensor, rhs_tensor):
lhs_scale = lhs_tensor.qnn_params['scale']
rhs_scale = rhs_tensor.qnn_params['scale']
lhs_zero_point = lhs_tensor.qnn_params['zero_point']
rhs_zero_point = rhs_tensor.qnn_params['zero_point']
return np.allclose( lhs_scale.data.asnumpy(), rhs_scale.data.asnumpy(), rtol=1e-5, atol=1e-5 ) and \
np.allclose( lhs_zero_point.data.asnumpy(), rhs_zero_point.data.asnumpy(), rtol=1e-5, atol=1e-5 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first comment, thanks, let me take a look.
For the second suggestion for has_same_qnn_params, I think we do not need that. For all the ops where we have to check params are same, they have scalar scale and zero point. This is because per-axis quantization is limited to weights, and thus limited to conv2d and dense op where we do not need this check.
try: | ||
from tflite.ActivationFunctionType import ActivationFunctionType | ||
except ImportError: | ||
raise ImportError("The tflite package must be installed") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is unnecessary given the import of ActivationFunctionType in the constructor here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried this but it failed, the scope of imports is limited to the functions in which they are imported.
@@ -692,6 +773,11 @@ def _hard_swish(data): | |||
|
|||
def convert_relu6(self, op): | |||
"""Convert TFLite ReLU6""" | |||
try: | |||
from tflite.ActivationFunctionType import ActivationFunctionType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as relu, I think this is unnecessary given the import of ActivationFunctionType in the constructor here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before
_test_convolution([4, 17, 17, 124], [1, 1, 124, 1], [1, 1], [1, 1], 'SAME', 'NHWC', True, quantized=quantized) | ||
_test_convolution([4, 17, 17, 12], [3, 3, 12, 1], [1, 1], [2, 2], 'VALID', 'NHWC', True, quantized=quantized) | ||
_test_convolution([4, 17, 17, 12], [3, 3, 12, 2], [1, 1], [2, 2], 'VALID', 'NHWC', True, quantized=quantized) | ||
# dephtwise convolution with single input channel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dephtwise / depthwise.
@u99127 can you please take a look again? |
LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and thanks for your patience.
@siju-samuel Please approve and merge if its looks good :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks @anijain2305 @u99127 @tqchen. This is now merged. |
* [TFLite] TFLite 2.x parser quantization support. * Address comments. Fix a bug for depthwise conv * Added tests for relu, conv, quantize. Address comments. * Using web-data. Minor refactoring. * Removing TF hub package * Trigger CI. * Handle TFLite input layer naming. * Addressing reviews. * Retrigger CI.
* [TFLite] TFLite 2.x parser quantization support. * Address comments. Fix a bug for depthwise conv * Added tests for relu, conv, quantize. Address comments. * Using web-data. Minor refactoring. * Removing TF hub package * Trigger CI. * Handle TFLite input layer naming. * Addressing reviews. * Retrigger CI.
TFLite 2.1.0 has changed the quantization specs from older TFLite versions. The specs are present at - https://www.tensorflow.org/lite/performance/quantization_spec
Key changes
please advise here).