Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TFLite] QNN support for TFLite 2.1.0 quantized models #5848

Merged
merged 9 commits into from
Jul 3, 2020

Conversation

anijain2305
Copy link
Contributor

@anijain2305 anijain2305 commented Jun 18, 2020

TFLite 2.1.0 has changed the quantization specs from older TFLite versions. The specs are present at - https://www.tensorflow.org/lite/performance/quantization_spec

Key changes

  • No TFLite2.0 hosted models. Need a way to create quantized models. Dependency on tensorflow_hub package (@tqchen
    please advise here).
  • Per-axis (aka channel) quantization
  • Int8 datatypes
  • Extra restrictions per op regarding scale and zero points
  • Quantize TFLite op also works as Requantize

@anijain2305 anijain2305 force-pushed the tf2_qnn branch 3 times, most recently from 0bbef96 to bd1cc7c Compare June 24, 2020 09:10
@anijain2305 anijain2305 changed the title [TFLite] TFLite 2.x parser quantization support. [TFLite] QNN support for TFLite 2.1.0 quantized models Jun 24, 2020
@anijain2305 anijain2305 marked this pull request as ready for review June 24, 2020 18:00
@anijain2305
Copy link
Contributor Author

Opening the PR for review. @siju-samuel @masahi @u99127 @tqchen

The CI fails because of a missing package - tensorflow_hub

Copy link
Contributor

@u99127 u99127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very quick review before I disappear for the day.

I think this is on the right track but needs some cleanups and some clarifications.

Dmitriy @d-smirnov, could you also casting your eye over this since you've been looking at this internally ?

scale = tflite_scale
# Ensure that all zero points are identical
zero_point = tflite_zero_point
assert all(x == zero_point[0] for x in zero_point)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor Nit : Can we use an error here instead of an assert to show us clearly the change that has happened ? It also means we can provide some sensible diagnostic ?

Comment on lines 246 to 250
# Params might be per-tensor or per-axis quantized. For per-tensor, scale and zero
# points are scalar. For per-axis, scale and zero points are tensors. But as per
# TFLite quantization spec, the restrictions on ops suggest that for per-axis, even
# if zero point is a tensor - all the zero points are identical. More infomration
# here - https://www.tensorflow.org/lite/performance/quantization_spec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, we are interpreting this from the fact that Conv2d and Depthwise_conv2d have a zero_point of 0 listed in their restriction even though they have per-axis quantization.

I would make the comment more explicit .

For per-axis or per-channel quantization the scale and zero points for the weights are tensors (?)

Comment on lines 2613 to 2707
input_tensors = self.get_input_tensors(op)
assert len(input_tensors) == 1, "input tensors length should be 1"
input_tensor = input_tensors[0]
input_tensor_type_str = self.get_tensor_type_str(input_tensor.tensor.Type())
in_expr = self.get_expr(input_tensor.tensor_idx)

output_tensors = self.get_output_tensors(op)
assert len(output_tensors) == 1, "output tensors length should be 1"
output_tensor = output_tensors[0]
output_tensor_type_str = self.get_tensor_type_str(output_tensor.tensor.Type())

# The output must be quantized
assert output_tensor.qnn_params
# Quantize the input
out = self.quantize(in_expr, output_tensor)

# TFLite Quantize op can also act as Requantize op
if input_tensor_type_str == "float32":
out = self.quantize(in_expr, output_tensor)
else:
out = _qnn.op.requantize(in_expr,
input_scale=input_tensor.qnn_params['scale'],
input_zero_point=input_tensor.qnn_params['zero_point'],
output_scale=output_tensor.qnn_params['scale'],
output_zero_point=output_tensor.qnn_params['zero_point'],
out_dtype=output_tensor_type_str)
return out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This to me looks like it can go in by it's own right as a separate PR but this needs a unit test change in tflite/test_forward.py .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I will add a test case in this PR. This will enable us to keep those 5 end to end tests as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Above test case added to force both types of quantize nodes

shape = tensor_wrapper.tensor.ShapeAsNumpy()

# Set shape to 1 if the data is a scalar type
if data.shape == (1,) and isinstance(shape, int) and shape == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm scratching my head at this condition with shape. Can you elaborate more as to why we need it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the comments. Can you take a look now? This is sort of a corner case when the TFLite buffer is a scalar.

@anijain2305
Copy link
Contributor Author

Thanks @u99127 for the review. All make sense. I will add clarifications.

python/tvm/relay/frontend/tflite.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/tflite.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/tflite.py Outdated Show resolved Hide resolved
tests/python/frontend/tflite/test_forward.py Show resolved Hide resolved
tests/python/frontend/tflite/test_forward.py Outdated Show resolved Hide resolved
tests/python/frontend/tflite/test_forward.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/tflite.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/tflite.py Show resolved Hide resolved
python/tvm/relay/frontend/tflite.py Outdated Show resolved Hide resolved
@anijain2305 anijain2305 force-pushed the tf2_qnn branch 3 times, most recently from 5aa8eac to 05fad24 Compare June 25, 2020 21:32
@anijain2305
Copy link
Contributor Author

@u99127 @siju-samuel CI passes now. Can you please take another look? There were more changes, mostly to add unit level tests.

zero_point=zero_point_val,
dtype=output_tensor_type_str)
else:
out = _op.clip(in_expr, a_min=0, a_max=6)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be relu, not clip

# Define a dummy model
# Keras model to force TFLite converter to insert 2 TFLite quantize ops.
# First TFLite quantize op converts float32 tensor to int8 tensor - Qnn quantize.
# Second TLite quantize op converts int8 tensor to int8 tensor - Qnn requantize.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLite -> TFLite

@siju-samuel
Copy link
Member

LGTM, Any more changes pending?

@anijain2305
Copy link
Contributor Author

@u99127 ping

Copy link
Contributor

@u99127 u99127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay - it's been a bit busy.

if data.size == 1 and isinstance(shape, int) and shape == 0:
shape = (1,)

if tensor_wrapper.tensor.Type() == TensorType.INT8:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit and this should really be credited to Dmitriy Smirnov. https://github.com/d-smirnov

the condition here could well be pulled out into a helper function that has a dictionary to help us map from TensorType to numpy type ?

Would make the code much cleaner and reduce duplication.

i.e. something like

def get_tensor_type_as_numpy(self, tensor_wrapper):
"""Returns np.dtype out of TensorType"""
"""Returns np.dtype out of TensorType"""
assert isinstance(tensor_wrapper, TensorWrapper)

    try:
        from tflite.TensorType import TensorType
        return { TensorType.UINT8:   np.uint8,
                 TensorType.INT8:    np.int8,   
                 TensorType.FLOAT32: np.float32,
                 TensorType.INT32:   np.int32,
                 TensorType.INT64:   np.int64,
                 TensorType.BOOL:    np.bool_ }[ tensor_wrapper.tensor.Type() ]
    except ImportError:
        raise ImportError("The tflite package must be installed")

    except KeyError:
        raise NotImplementedError("Tensor type '{}' currently not supported"
                                  .format(tensor_wrapper.tensor.Type()))

def get_tensor_value(self, tensor_wrapper):
    """Get tensor buffer value from given tensor wrapper"""
    assert isinstance(tensor_wrapper, TensorWrapper)

    value_type = self.get_tensor_type_as_numpy( tensor_wrapper )
    return np.frombuffer( tensor_wrapper.buffer.DataAsNumpy(), dtype=value_type ).reshape(
            tensor_wrapper.tensor.ShapeAsNumpy() \
                if 0 != tensor_wrapper.tensor.ShapeLength() \
                    else [] )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider following change as well:

    def has_same_qnn_params(self, lhs_tensor, rhs_tensor):
        lhs_scale = lhs_tensor.qnn_params['scale']
        rhs_scale = rhs_tensor.qnn_params['scale']
        lhs_zero_point = lhs_tensor.qnn_params['zero_point']
        rhs_zero_point = rhs_tensor.qnn_params['zero_point']
        return np.allclose( lhs_scale.data.asnumpy(), rhs_scale.data.asnumpy(), rtol=1e-5, atol=1e-5 ) and \
               np.allclose( lhs_zero_point.data.asnumpy(), rhs_zero_point.data.asnumpy(), rtol=1e-5, atol=1e-5 )


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first comment, thanks, let me take a look.

For the second suggestion for has_same_qnn_params, I think we do not need that. For all the ops where we have to check params are same, they have scalar scale and zero point. This is because per-axis quantization is limited to weights, and thus limited to conv2d and dense op where we do not need this check.

Comment on lines +704 to +708
try:
from tflite.ActivationFunctionType import ActivationFunctionType
except ImportError:
raise ImportError("The tflite package must be installed")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unnecessary given the import of ActivationFunctionType in the constructor here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried this but it failed, the scope of imports is limited to the functions in which they are imported.

@@ -692,6 +773,11 @@ def _hard_swish(data):

def convert_relu6(self, op):
"""Convert TFLite ReLU6"""
try:
from tflite.ActivationFunctionType import ActivationFunctionType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as relu, I think this is unnecessary given the import of ActivationFunctionType in the constructor here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before

_test_convolution([4, 17, 17, 124], [1, 1, 124, 1], [1, 1], [1, 1], 'SAME', 'NHWC', True, quantized=quantized)
_test_convolution([4, 17, 17, 12], [3, 3, 12, 1], [1, 1], [2, 2], 'VALID', 'NHWC', True, quantized=quantized)
_test_convolution([4, 17, 17, 12], [3, 3, 12, 2], [1, 1], [2, 2], 'VALID', 'NHWC', True, quantized=quantized)
# dephtwise convolution with single input channel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dephtwise / depthwise.

@anijain2305
Copy link
Contributor Author

@u99127 can you please take a look again?

@u99127
Copy link
Contributor

u99127 commented Jul 2, 2020

@u99127 can you please take a look again?

LGTM.

Copy link
Contributor

@u99127 u99127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and thanks for your patience.

@anijain2305
Copy link
Contributor Author

@siju-samuel Please approve and merge if its looks good :)

Copy link
Member

@siju-samuel siju-samuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@siju-samuel siju-samuel merged commit 575a383 into apache:master Jul 3, 2020
@siju-samuel
Copy link
Member

Thanks @anijain2305 @u99127 @tqchen. This is now merged.

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jul 14, 2020
* [TFLite] TFLite 2.x parser quantization support.

* Address comments. Fix a bug for depthwise conv

* Added tests for relu, conv, quantize. Address comments.

* Using web-data. Minor refactoring.

* Removing TF hub package

* Trigger CI.

* Handle TFLite input layer naming.

* Addressing reviews.

* Retrigger CI.
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jul 14, 2020
* [TFLite] TFLite 2.x parser quantization support.

* Address comments. Fix a bug for depthwise conv

* Added tests for relu, conv, quantize. Address comments.

* Using web-data. Minor refactoring.

* Removing TF hub package

* Trigger CI.

* Handle TFLite input layer naming.

* Addressing reviews.

* Retrigger CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants