[OSNet] int8 tflite model - catastrophic accuracy degradation #444

MustafaYounes1 · 2023-08-08T11:31:55Z

Issue Type

Others

OS

Linux

onnx2tf version number

1.15.8

onnx version number

1.13.1

onnxruntime version number

1.15.0

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.13.0

Download URL for ONNX

osnet_quant_issue.zip

Parameter Replacement JSON

None

Description

Source Model Information

OSNet is a person-reid model that was trained using Pytorch and converted to ONNX with the pre-trained ImageNet weights.

`onnx2tf` conversion command

onnx2tf \
-i ./../onnx_models/040_osnet_x1_0/onnx_fp_32_bs_1.onnx \
-o ./../tflite_models/040_osnet_x1_0/osnet_x1_0_bs_1/ \
-otfv1pb \
-osd \
-oiqt \
-qt per-tensor \
-cind "input.1" "./../calibration_data/onnx2tf_calib/calib_data_duke_500_bs_1_nhwc_fp32.npy" "[[[[0.485,0.456,0.406]]]]" "[[[[0.229,0.224,0.225]]]]"

The quantization process was calibrated using 100 samples from the DukeMTMC person-reid dataset, the samples were normalized between 0 and 1 and preprocessed accordingly.

Issue Description

I checked the accuracy of the converted float32 tflite model and it was pretty much the same as the source model, however, when I checked the accuracy of the int8 model, I encountered a catastrophic accuracy drop (more than 95%)

I read Section 7 from the README file, and it was clearly stated that it could be a matter of the model structure, is there any way to fix this problem?

Resources

You can find the following resources in the attached zip file:

osnet_x1_0_fp_32_bs_1.onnx : the source ONNX model.
osnet_x1_0_imagenet_fp32_bs_1_float32.tflite: output fp32 tflite model.
osnet_x1_0_imagenet_fp32_bs_1_integer_quant.tflite: output int8 tflite model
accuracy_check.py: a python script that takes in the fp32/int8 tflite models and an input image, it runs the models on the input image and measures the cosine similarity of the output embeddings (this will simplify the accuracy check on your end)
0001_c6_f0030809.jpg: an input image sample from the DukeMTMC

The text was updated successfully, but these errors were encountered:

PINTO0309 · 2023-08-08T11:43:53Z

The Float32 accuracy is in perfect agreement, so the structure of the model must be very weakly structured for quantization. onnx2tf does not perform any special operations with respect to quantization; it simply uses all the standard features of TFLiteConverter.

Note that the Float32 model and the INT8 model are simply converted internally using the same Keras structural model, so there can be no cause for the degradation in accuracy other than the structural reasons for the original model. I don't know all the activation functions and model structures that are vulnerable to quantization, so your investigative efforts will help me identify the causes of the degradation.

You will need to transform the model up to the midpoint using the -onimc option and identify at which locations significant accuracy degradation is occurring.

Float32 has no precision degradation whatsoever.

onnx2tf \
-i osnet_x1_0_fp_32_bs_1.onnx \
-cotof

MustafaYounes1 · 2023-08-08T13:37:35Z

Thanks for the amazing quick reply @PINTO0309

I can briefly describe the network structure as follows:

The underpinning building block of this network is the Lite 3x3 convolutional block, it's the same as the Famous MobileNet Depthwise Convolution

The authors equipped the Famous ResNet residual block with this 'Lite 3x3' block, they adopted the multi-streaming topology in the residual block, we have 4 sub-streams in each residual block, each one of them has its own receptive field (based on how many consecutive `Lite 3x3' blocks we have). Each one of these sub-streams tries to learn features of homogenous scales, and then the learned features are fused using an aggregation unit (Global average pooling -> 1x1 conv -> Relu -> 1x1 conv)

Regarding the network activations, we don't have fancy activations at all, 3 types of activations can be found in this network (Relu, Sigmoid, and Linear activation)

If you're interested in more details, you can find more information about the network topology in this paper, and you can have a look at the network implementation in this script

However, I do agree with splitting the model into smaller subgraphs and see where the problem starts, but this would be time-consuming and I won't manage to do it quickly

PINTO0309 · 2023-08-08T13:37:42Z

There was a bug in the behavior of the -onimc option that is being corrected. It will be improved in v1.15.9.

PINTO0309 · 2023-08-08T13:45:28Z

I seem to have posted my comment at about the same time as yours.

Your information has given me an understanding of the structure of the model. Thank you.

However, it is true that onnx2tf faithfully transforms the model structure of ONNX, so I do not think that the problem of degradation of the accuracy of the resulting model after quantization is a problem with onnx2tf.

I am not sure where the quantization problem lies. In the past, when I experienced significant accuracy degradation in YOLO's SiLU, I identified the problem area through diligent research, searched for papers on accuracy degradation in INT8 quantization, and as a result, identified the problem in SiLU (Swish), ReLU6, and Concat.

I have been working on quantization for about 5 years, but I remember that OSNet has a significant degradation in accuracy. However, I have never done a more in-depth investigation.

Your solution to this problem is going to be a great contribution to the community.

Btw, If the bug in the -onimc option is fixed, it will be possible to split the model and see changes in the output, as shown in the figure below.

MustafaYounes1 · 2023-08-08T13:53:35Z

Thanks very much for this information.

Hope we can contribute and elaborate on solving this issue. And the -onimc option fix will simplify our investigations indeed, so thanks a lot in advance!

PINTO0309 · 2023-08-08T13:56:22Z

The regression test by CI takes about 2 hours, so the latest version will be released in about 2 hours.

PINTO0309 · 2023-08-08T16:36:03Z

Fixes: https://github.com/PINTO0309/onnx2tf/releases/tag/1.15.9

example

onnx2tf \
-i osnet_x1_0_fp_32_bs_1.onnx \
-onimc /conv2/conv2.0/Relu_output_0 \
-oiqt \
-qt per-tensor

MustafaYounes1 · 2023-08-08T21:46:53Z

Thanks a lot for the provided fix !!

I started my investigations at the very beginning of the model, and things are getting interesting!

I'm trying to spot the position where the significant accuracy drop begins, that's why I updated the provided accuracy_check.py python script, with a new script subgraph_acc_check.py that tries to flatten the outputs of the subgraphs of both float32/int8 tflite models and measure the Euclidean Distance of the flattened features (Cosine similarity could be a bit of an issue if you got a feature vector of norm 0, that's why this code doesn't even calculate Normalized Euclidean Distance)

subgraph_acc_check.zip

First cut was at /conv1/relu/Relu_output_0 (before the residual blocks right after Relu) onnx2tf -i osnet_x1_0_fp_32_bs_1.onnx -o ./fix_acc_issue/tmp_0 -onimc /conv1/relu/Relu_output_0 -oiqt -qt per-tensor

Float32 model outputs flattened shape: (524288,)
Int8 model outputs flattened shape: (524288,)
Euclidean Distance: 6.558413505554199  # Acceptable gap

Second cut was at /maxpool/MaxPool_output_0 (before the residual blocks right after MaxPool2D) onnx2tf -i osnet_x1_0_fp_32_bs_1.onnx -o ./fix_acc_issue/tmp_1 -onimc /maxpool/MaxPool_output_0 -oiqt -qt per-tensor

Float32 model outputs flattened shape: (131072,)
Int8 model outputs flattened shape: (131072,)
Euclidean Distance: 152.93930053710938  # Significant gap

Interestingly, I have found that the outputs of the int8 tflite model right after the /maxpool/MaxPool_output_0 are all zeros!!

PINTO0309 · 2023-08-08T23:47:07Z

I see. There is a tremendously small negative value padded in Padv2. Perhaps this is causing the calculation results to overflow.

This is the part. The minimum value of Float32 is used. However, it is only a guess.

     # use minimum limit value of data type for explicit padding value since this is max pooling 
     padded_tensor = tf.pad( 
         tensor=input_tensor, 
         paddings=tf_pads, 
         mode='CONSTANT', 
         constant_values=input_tensor.dtype.min 
     )

onnx2tf/onnx2tf/ops/MaxPool.py

Lines 206 to 232 in 32534d7

    
           # add extra pad layer if needed 
        
           if is_explicit_padding and tf_pads != [0] * spatial_size * 2: 
        
               warn( 
        
                   f'Tensorflow incompatible padding detected. ' \ 
        
                   f'Extra pad layer is inserted automatically. ' 
        
               ) 
        
               if auto_pad == 'SAME_LOWER': 
        
                   # switch the order of pads 
        
                   tf_pads = [i for tup in zip(tf_pads[len(tf_pads) // 2:], tf_pads[:len(tf_pads) // 2]) for i in tup] 
        
               # convert to tensorflow padding format 
        
               tf_pads = \ 
        
                   [[0, 0]] + \ 
        
                   [list(i) for i in zip(tf_pads[:len(tf_pads) // 2], tf_pads[len(tf_pads) // 2:])] + \ 
        
                   [[0, 0]] 
        
               # use minimum limit value of data type for explicit padding value since this is max pooling 
        
               padded_tensor = tf.pad( 
        
                   tensor=input_tensor, 
        
                   paddings=tf_pads, 
        
                   mode='CONSTANT', 
        
                   constant_values=input_tensor.dtype.min 
        
               ) 
        
           else: 
        
               padded_tensor = input_tensor

Just changing the padding value to a larger value changed the output. However, the error is still large.

Padding constant_values: `-3.4028234663852886e+38` -> `-255.0`

        # use minimum limit value of data type for explicit padding value since this is max pooling
        padded_tensor = tf.pad(
            tensor=input_tensor,
            paddings=tf_pads,
            mode='CONSTANT',
            constant_values=input_tensor.dtype.min \
                if not output_integer_quantized_tflite else -255.0
        )

Float32 model outputs flattened shape: (131072,)
Int8 model outputs flattened shape: (131072,)
Euclidean Distance: 97.63984680175781

Padding constant_values: `-3.4028234663852886e+38` -> `-128.0`

        # use minimum limit value of data type for explicit padding value since this is max pooling
        padded_tensor = tf.pad(
            tensor=input_tensor,
            paddings=tf_pads,
            mode='CONSTANT',
            constant_values=input_tensor.dtype.min \
                if not output_integer_quantized_tflite else -128.0
        )

Float32 model outputs flattened shape: (131072,)
Int8 model outputs flattened shape: (131072,)
Euclidean Distance: 57.43968200683594

Padding constant_values: `-3.4028234663852886e+38` -> `0.0`

        # use minimum limit value of data type for explicit padding value since this is max pooling
        padded_tensor = tf.pad(
            tensor=input_tensor,
            paddings=tf_pads,
            mode='CONSTANT',
            constant_values=input_tensor.dtype.min \
                if not output_integer_quantized_tflite else 0.0
        )

Float32 model outputs flattened shape: (131072,)
Int8 model outputs flattened shape: (131072,)
Euclidean Distance: 2.7112042903900146

The padding process just before MaxPool2D appears to minimize the Euclidean distance only when the four sides of the input tensor are forced to be padded with a fixed value of zero. It seems that only when doing INT8 quantization do I have to force screw up these workarounds to make it work.

I can only assume that the TFLite runtime behavior of MaxPool2D in the INT8 quantization model is strange. I'm having trouble deciding how to properly work around the TFLite runtime bug on the onnx2tf side, since padding zeros with fixed values doesn't seem reasonable.

Reluctantly, I tried quantizing the entire model by forcing MaxPool2D to padding with zeros when padding 4 sides during INT8 quantization, and the Euclidean distance appears to be within the acceptable error range.

onnx2tf \
-i osnet_x1_0_fp_32_bs_1.onnx -o ./fix_acc_issue/tmp_2 \
-oiqt \
-qt per-tensor \
-cotof

Float32 model outputs flattened shape: (512,)
Int8 model outputs flattened shape: (512,)
Euclidean Distance: 2.0942165851593018

…he minimum value causes the output error of `MaxPool2D` to be maximized only when quantizing with INT8 quantization. #444

Implemented a workaround to deal with the problem that padding with the minimum value causes the output error of `MaxPool2D` to be maximized only when quantizing with INT8 quantization. #444

PINTO0309 · 2023-08-09T02:29:54Z

Probably resolved: https://github.com/PINTO0309/onnx2tf/releases/tag/1.15.10

mikel-brostrom · 2023-08-09T07:27:55Z

Can you confirm that this has been resolved @MustafaYounes1? Planing to integrate this in: https://github.com/mikel-brostrom/yolo_tracking

MustafaYounes1 · 2023-08-09T08:02:36Z

Thanks @PINTO0309 for your efforts !!

I did a test on my custom data and the int8 tflite is working quite well with the adjusted padding constant value! (about 0.4 accuracy drop which is pretty acceptable)

I totally understand your concerns regarding the integration between onnx2tf and tf.lite.TFLiteConverter, and padding with zeros looks good to me when performing int8 quantization (it seems that a tremendously small padding value is not suitable for int8 quantization), however, if you are not totally satisfied with the new pad behaviour we can investigate this fix with other models, otherwise we can close this issue as it has been resolved.

@mikel-brostrom hope you have got your answer ..

mikel-brostrom · 2023-08-09T08:06:23Z

Thx @MustafaYounes1 for letting me know 😄

PINTO0309 · 2023-08-09T10:35:19Z

@MustafaYounes1
It is true that it remains to be seen whether other models will give correct quantization results, but for now I think it is fine as it is. If you run into problems when quantizing other models, it would be helpful if you could help us discuss and investigate again to resolve the issue.

@mikel-brostrom
I'm glad it seems to be working. Come to think of it, the model you posted an issue on the other day was also OSNet.

mikel-brostrom · 2023-08-09T10:46:25Z

Come to think of it, the model you posted an issue on the other day was also OSNet.

Yup 😄. This serves me well 👍

MustafaYounes1 · 2023-08-09T12:06:28Z

I'll gladly discuss other potential issues with you again @PINTO0309

Thank you very much, and since we can get a valid quantized OSNet now, I will close this issue.

PINTO0309 added the Quantization Quantization label Aug 8, 2023

PINTO0309 mentioned this issue Aug 8, 2023

Fixed a lack of processing of OP name sanitizing when the -onimc and -osd or -oiqt options are specified in combination. #445

Merged

PINTO0309 added discussion Specification Discussion OP:MaxPool OP:MaxPool labels Aug 9, 2023

PINTO0309 added a commit that referenced this issue Aug 9, 2023

Implemented a workaround to deal with the problem that padding with t…

948fecf

…he minimum value causes the output error of `MaxPool2D` to be maximized only when quantizing with INT8 quantization. #444

PINTO0309 mentioned this issue Aug 9, 2023

Implemented a workaround to deal with the problem that padding with the minimum value causes the output error of MaxPool2D to be maximized only when quantizing with INT8 quantization. #444 #446

Merged

PINTO0309 added the third party Third-party tool issues label Aug 9, 2023

MustafaYounes1 closed this as completed Aug 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OSNet] int8 tflite model - catastrophic accuracy degradation #444

[OSNet] int8 tflite model - catastrophic accuracy degradation #444

MustafaYounes1 commented Aug 8, 2023 •

edited

Loading

PINTO0309 commented Aug 8, 2023 •

edited

Loading

MustafaYounes1 commented Aug 8, 2023

PINTO0309 commented Aug 8, 2023

PINTO0309 commented Aug 8, 2023 •

edited

Loading

MustafaYounes1 commented Aug 8, 2023

PINTO0309 commented Aug 8, 2023

PINTO0309 commented Aug 8, 2023

MustafaYounes1 commented Aug 8, 2023 •

edited

Loading

PINTO0309 commented Aug 8, 2023 •

edited

Loading

PINTO0309 commented Aug 9, 2023

mikel-brostrom commented Aug 9, 2023

MustafaYounes1 commented Aug 9, 2023 •

edited

Loading

mikel-brostrom commented Aug 9, 2023 •

edited

Loading

PINTO0309 commented Aug 9, 2023

mikel-brostrom commented Aug 9, 2023

MustafaYounes1 commented Aug 9, 2023

[OSNet] int8 tflite model - catastrophic accuracy degradation #444

[OSNet] int8 tflite model - catastrophic accuracy degradation #444

Comments

MustafaYounes1 commented Aug 8, 2023 • edited Loading

Issue Type

OS

onnx2tf version number

onnx version number

onnxruntime version number

onnxsim (onnx_simplifier) version number

tensorflow version number

Download URL for ONNX

Parameter Replacement JSON

Description

Source Model Information

onnx2tf conversion command

Issue Description

Resources

PINTO0309 commented Aug 8, 2023 • edited Loading

MustafaYounes1 commented Aug 8, 2023

PINTO0309 commented Aug 8, 2023

PINTO0309 commented Aug 8, 2023 • edited Loading

MustafaYounes1 commented Aug 8, 2023

PINTO0309 commented Aug 8, 2023

PINTO0309 commented Aug 8, 2023

MustafaYounes1 commented Aug 8, 2023 • edited Loading

PINTO0309 commented Aug 8, 2023 • edited Loading

Padding constant_values: -3.4028234663852886e+38 -> -255.0

Padding constant_values: -3.4028234663852886e+38 -> -128.0

Padding constant_values: -3.4028234663852886e+38 -> 0.0

PINTO0309 commented Aug 9, 2023

mikel-brostrom commented Aug 9, 2023

MustafaYounes1 commented Aug 9, 2023 • edited Loading

mikel-brostrom commented Aug 9, 2023 • edited Loading

PINTO0309 commented Aug 9, 2023

mikel-brostrom commented Aug 9, 2023

MustafaYounes1 commented Aug 9, 2023

MustafaYounes1 commented Aug 8, 2023 •

edited

Loading

`onnx2tf` conversion command

PINTO0309 commented Aug 8, 2023 •

edited

Loading

PINTO0309 commented Aug 8, 2023 •

edited

Loading

MustafaYounes1 commented Aug 8, 2023 •

edited

Loading

PINTO0309 commented Aug 8, 2023 •

edited

Loading

Padding constant_values: `-3.4028234663852886e+38` -> `-255.0`

Padding constant_values: `-3.4028234663852886e+38` -> `-128.0`

Padding constant_values: `-3.4028234663852886e+38` -> `0.0`

MustafaYounes1 commented Aug 9, 2023 •

edited

Loading

mikel-brostrom commented Aug 9, 2023 •

edited

Loading