Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OSNet] int8 tflite model - catastrophic accuracy degradation #444

Closed
MustafaYounes1 opened this issue Aug 8, 2023 · 16 comments
Closed
Labels
discussion Specification Discussion OP:MaxPool OP:MaxPool Quantization Quantization third party Third-party tool issues

Comments

@MustafaYounes1
Copy link

MustafaYounes1 commented Aug 8, 2023

Issue Type

Others

OS

Linux

onnx2tf version number

1.15.8

onnx version number

1.13.1

onnxruntime version number

1.15.0

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.13.0

Download URL for ONNX

osnet_quant_issue.zip

Parameter Replacement JSON

None

Description

Source Model Information

OSNet is a person-reid model that was trained using Pytorch and converted to ONNX with the pre-trained ImageNet weights.

onnx2tf conversion command

onnx2tf \
-i ./../onnx_models/040_osnet_x1_0/onnx_fp_32_bs_1.onnx \
-o ./../tflite_models/040_osnet_x1_0/osnet_x1_0_bs_1/ \
-otfv1pb \
-osd \
-oiqt \
-qt per-tensor \
-cind "input.1" "./../calibration_data/onnx2tf_calib/calib_data_duke_500_bs_1_nhwc_fp32.npy" "[[[[0.485,0.456,0.406]]]]" "[[[[0.229,0.224,0.225]]]]"

The quantization process was calibrated using 100 samples from the DukeMTMC person-reid dataset, the samples were normalized between 0 and 1 and preprocessed accordingly.

Issue Description

I checked the accuracy of the converted float32 tflite model and it was pretty much the same as the source model, however, when I checked the accuracy of the int8 model, I encountered a catastrophic accuracy drop (more than 95%)

I read Section 7 from the README file, and it was clearly stated that it could be a matter of the model structure, is there any way to fix this problem?

Resources

You can find the following resources in the attached zip file:

  1. osnet_x1_0_fp_32_bs_1.onnx : the source ONNX model.
  2. osnet_x1_0_imagenet_fp32_bs_1_float32.tflite: output fp32 tflite model.
  3. osnet_x1_0_imagenet_fp32_bs_1_integer_quant.tflite: output int8 tflite model
  4. accuracy_check.py: a python script that takes in the fp32/int8 tflite models and an input image, it runs the models on the input image and measures the cosine similarity of the output embeddings (this will simplify the accuracy check on your end)
  5. 0001_c6_f0030809.jpg: an input image sample from the DukeMTMC
@PINTO0309
Copy link
Owner

PINTO0309 commented Aug 8, 2023

The Float32 accuracy is in perfect agreement, so the structure of the model must be very weakly structured for quantization. onnx2tf does not perform any special operations with respect to quantization; it simply uses all the standard features of TFLiteConverter.

Note that the Float32 model and the INT8 model are simply converted internally using the same Keras structural model, so there can be no cause for the degradation in accuracy other than the structural reasons for the original model. I don't know all the activation functions and model structures that are vulnerable to quantization, so your investigative efforts will help me identify the causes of the degradation.

You will need to transform the model up to the midpoint using the -onimc option and identify at which locations significant accuracy degradation is occurring.

Float32 has no precision degradation whatsoever.

onnx2tf \
-i osnet_x1_0_fp_32_bs_1.onnx \
-cotof

image

@MustafaYounes1
Copy link
Author

Thanks for the amazing quick reply @PINTO0309

I can briefly describe the network structure as follows:

  • The underpinning building block of this network is the Lite 3x3 convolutional block, it's the same as the Famous MobileNet Depthwise Convolution

lite3x3

  • The authors equipped the Famous ResNet residual block with this 'Lite 3x3' block, they adopted the multi-streaming topology in the residual block, we have 4 sub-streams in each residual block, each one of them has its own receptive field (based on how many consecutive `Lite 3x3' blocks we have). Each one of these sub-streams tries to learn features of homogenous scales, and then the learned features are fused using an aggregation unit (Global average pooling -> 1x1 conv -> Relu -> 1x1 conv)

os_residual

  • Regarding the network activations, we don't have fancy activations at all, 3 types of activations can be found in this network (Relu, Sigmoid, and Linear activation)

If you're interested in more details, you can find more information about the network topology in this paper, and you can have a look at the network implementation in this script

However, I do agree with splitting the model into smaller subgraphs and see where the problem starts, but this would be time-consuming and I won't manage to do it quickly

@PINTO0309
Copy link
Owner

There was a bug in the behavior of the -onimc option that is being corrected. It will be improved in v1.15.9.

@PINTO0309
Copy link
Owner

PINTO0309 commented Aug 8, 2023

I seem to have posted my comment at about the same time as yours.

Your information has given me an understanding of the structure of the model. Thank you.

However, it is true that onnx2tf faithfully transforms the model structure of ONNX, so I do not think that the problem of degradation of the accuracy of the resulting model after quantization is a problem with onnx2tf.

I am not sure where the quantization problem lies. In the past, when I experienced significant accuracy degradation in YOLO's SiLU, I identified the problem area through diligent research, searched for papers on accuracy degradation in INT8 quantization, and as a result, identified the problem in SiLU (Swish), ReLU6, and Concat.

I have been working on quantization for about 5 years, but I remember that OSNet has a significant degradation in accuracy. However, I have never done a more in-depth investigation.

Your solution to this problem is going to be a great contribution to the community.

Btw, If the bug in the -onimc option is fixed, it will be possible to split the model and see changes in the output, as shown in the figure below.
image

@MustafaYounes1
Copy link
Author

Thanks very much for this information.

Hope we can contribute and elaborate on solving this issue. And the -onimc option fix will simplify our investigations indeed, so thanks a lot in advance!

@PINTO0309
Copy link
Owner

The regression test by CI takes about 2 hours, so the latest version will be released in about 2 hours.

@PINTO0309
Copy link
Owner

Fixes: https://github.com/PINTO0309/onnx2tf/releases/tag/1.15.9

  • example
    onnx2tf \
    -i osnet_x1_0_fp_32_bs_1.onnx \
    -onimc /conv2/conv2.0/Relu_output_0 \
    -oiqt \
    -qt per-tensor

@MustafaYounes1
Copy link
Author

MustafaYounes1 commented Aug 8, 2023

Thanks a lot for the provided fix !!

I started my investigations at the very beginning of the model, and things are getting interesting!

I'm trying to spot the position where the significant accuracy drop begins, that's why I updated the provided accuracy_check.py python script, with a new script subgraph_acc_check.py that tries to flatten the outputs of the subgraphs of both float32/int8 tflite models and measure the Euclidean Distance of the flattened features (Cosine similarity could be a bit of an issue if you got a feature vector of norm 0, that's why this code doesn't even calculate Normalized Euclidean Distance)

subgraph_acc_check.zip

  • First cut was at /conv1/relu/Relu_output_0 (before the residual blocks right after Relu) onnx2tf -i osnet_x1_0_fp_32_bs_1.onnx -o ./fix_acc_issue/tmp_0 -onimc /conv1/relu/Relu_output_0 -oiqt -qt per-tensor

tmp_0

Float32 model outputs flattened shape: (524288,)
Int8 model outputs flattened shape: (524288,)
Euclidean Distance: 6.558413505554199  # Acceptable gap
  • Second cut was at /maxpool/MaxPool_output_0 (before the residual blocks right after MaxPool2D) onnx2tf -i osnet_x1_0_fp_32_bs_1.onnx -o ./fix_acc_issue/tmp_1 -onimc /maxpool/MaxPool_output_0 -oiqt -qt per-tensor

tmp_1

Float32 model outputs flattened shape: (131072,)
Int8 model outputs flattened shape: (131072,)
Euclidean Distance: 152.93930053710938  # Significant gap

Interestingly, I have found that the outputs of the int8 tflite model right after the /maxpool/MaxPool_output_0 are all zeros!!

@PINTO0309
Copy link
Owner

PINTO0309 commented Aug 8, 2023

I see. There is a tremendously small negative value padded in Padv2. Perhaps this is causing the calculation results to overflow.

image

This is the part. The minimum value of Float32 is used. However, it is only a guess.

     # use minimum limit value of data type for explicit padding value since this is max pooling 
     padded_tensor = tf.pad( 
         tensor=input_tensor, 
         paddings=tf_pads, 
         mode='CONSTANT', 
         constant_values=input_tensor.dtype.min 
     ) 

# add extra pad layer if needed
if is_explicit_padding and tf_pads != [0] * spatial_size * 2:
warn(
f'Tensorflow incompatible padding detected. ' \
f'Extra pad layer is inserted automatically. '
)
if auto_pad == 'SAME_LOWER':
# switch the order of pads
tf_pads = [i for tup in zip(tf_pads[len(tf_pads) // 2:], tf_pads[:len(tf_pads) // 2]) for i in tup]
# convert to tensorflow padding format
tf_pads = \
[[0, 0]] + \
[list(i) for i in zip(tf_pads[:len(tf_pads) // 2], tf_pads[len(tf_pads) // 2:])] + \
[[0, 0]]
# use minimum limit value of data type for explicit padding value since this is max pooling
padded_tensor = tf.pad(
tensor=input_tensor,
paddings=tf_pads,
mode='CONSTANT',
constant_values=input_tensor.dtype.min
)
else:
padded_tensor = input_tensor

Just changing the padding value to a larger value changed the output. However, the error is still large.

Padding constant_values: -3.4028234663852886e+38 -> -255.0

        # use minimum limit value of data type for explicit padding value since this is max pooling
        padded_tensor = tf.pad(
            tensor=input_tensor,
            paddings=tf_pads,
            mode='CONSTANT',
            constant_values=input_tensor.dtype.min \
                if not output_integer_quantized_tflite else -255.0
        )

image

Float32 model outputs flattened shape: (131072,)
Int8 model outputs flattened shape: (131072,)
Euclidean Distance: 97.63984680175781

Padding constant_values: -3.4028234663852886e+38 -> -128.0

        # use minimum limit value of data type for explicit padding value since this is max pooling
        padded_tensor = tf.pad(
            tensor=input_tensor,
            paddings=tf_pads,
            mode='CONSTANT',
            constant_values=input_tensor.dtype.min \
                if not output_integer_quantized_tflite else -128.0
        )

image

Float32 model outputs flattened shape: (131072,)
Int8 model outputs flattened shape: (131072,)
Euclidean Distance: 57.43968200683594

Padding constant_values: -3.4028234663852886e+38 -> 0.0

        # use minimum limit value of data type for explicit padding value since this is max pooling
        padded_tensor = tf.pad(
            tensor=input_tensor,
            paddings=tf_pads,
            mode='CONSTANT',
            constant_values=input_tensor.dtype.min \
                if not output_integer_quantized_tflite else 0.0
        )

image

Float32 model outputs flattened shape: (131072,)
Int8 model outputs flattened shape: (131072,)
Euclidean Distance: 2.7112042903900146

The padding process just before MaxPool2D appears to minimize the Euclidean distance only when the four sides of the input tensor are forced to be padded with a fixed value of zero. It seems that only when doing INT8 quantization do I have to force screw up these workarounds to make it work.

I can only assume that the TFLite runtime behavior of MaxPool2D in the INT8 quantization model is strange. I'm having trouble deciding how to properly work around the TFLite runtime bug on the onnx2tf side, since padding zeros with fixed values doesn't seem reasonable.

Reluctantly, I tried quantizing the entire model by forcing MaxPool2D to padding with zeros when padding 4 sides during INT8 quantization, and the Euclidean distance appears to be within the acceptable error range.

onnx2tf \
-i osnet_x1_0_fp_32_bs_1.onnx -o ./fix_acc_issue/tmp_2 \
-oiqt \
-qt per-tensor \
-cotof

image

Float32 model outputs flattened shape: (512,)
Int8 model outputs flattened shape: (512,)
Euclidean Distance: 2.0942165851593018

@PINTO0309 PINTO0309 added discussion Specification Discussion OP:MaxPool OP:MaxPool labels Aug 9, 2023
PINTO0309 added a commit that referenced this issue Aug 9, 2023
…he minimum value causes the output error of `MaxPool2D` to be maximized only when quantizing with INT8 quantization. #444
@PINTO0309 PINTO0309 added the third party Third-party tool issues label Aug 9, 2023
PINTO0309 added a commit that referenced this issue Aug 9, 2023
Implemented a workaround to deal with the problem that padding with the minimum value causes the output error of `MaxPool2D` to be maximized only when quantizing with INT8 quantization. #444
@PINTO0309
Copy link
Owner

@mikel-brostrom
Copy link
Contributor

Can you confirm that this has been resolved @MustafaYounes1? Planing to integrate this in: https://github.com/mikel-brostrom/yolo_tracking

@MustafaYounes1
Copy link
Author

MustafaYounes1 commented Aug 9, 2023

Thanks @PINTO0309 for your efforts !!

I did a test on my custom data and the int8 tflite is working quite well with the adjusted padding constant value! (about 0.4 accuracy drop which is pretty acceptable)

I totally understand your concerns regarding the integration between onnx2tf and tf.lite.TFLiteConverter, and padding with zeros looks good to me when performing int8 quantization (it seems that a tremendously small padding value is not suitable for int8 quantization), however, if you are not totally satisfied with the new pad behaviour we can investigate this fix with other models, otherwise we can close this issue as it has been resolved.

@mikel-brostrom hope you have got your answer ..

@mikel-brostrom
Copy link
Contributor

mikel-brostrom commented Aug 9, 2023

Thx @MustafaYounes1 for letting me know 😄

@PINTO0309
Copy link
Owner

@MustafaYounes1
It is true that it remains to be seen whether other models will give correct quantization results, but for now I think it is fine as it is. If you run into problems when quantizing other models, it would be helpful if you could help us discuss and investigate again to resolve the issue.

@mikel-brostrom
I'm glad it seems to be working. Come to think of it, the model you posted an issue on the other day was also OSNet.

@mikel-brostrom
Copy link
Contributor

Come to think of it, the model you posted an issue on the other day was also OSNet.

Yup 😄. This serves me well 👍

@MustafaYounes1
Copy link
Author

I'll gladly discuss other potential issues with you again @PINTO0309

Thank you very much, and since we can get a valid quantized OSNet now, I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Specification Discussion OP:MaxPool OP:MaxPool Quantization Quantization third party Third-party tool issues
Projects
None yet
Development

No branches or pull requests

3 participants