Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to control scale and zero point of full integer quantized (INT8) tflite model during conversion? #709

Closed
deltacosos opened this issue Oct 11, 2024 · 5 comments

Comments

@deltacosos
Copy link

Issue Type

Feature Request

OS

Linux

onnx2tf version number

1.20.0

onnx version number

1.16.2

onnxruntime version number

1.19.2

onnxsim (onnx_simplifier) version number

tensorflow version number

2.17.0

Download URL for ONNX

https://github.com/ultralytics/ultralytics/blob/main/docs/en/models/yolov8.md

SiLUs replaced with ReLU. 320 resolution.

Parameter Replacement JSON

-

Description

  1. My purpose is to personally test how tflite conversion and quantization works without need of running scale and offset correction on CPU while running object detection model.
  2. Currently when I quantize the model with representative dataset, the scale and mean of input tensor is correct (0.003921569 and 0.0 respectively) but I would like to get same values for the output tensor but cannot do it (currently for output tensor they are 0.00605 and -7). In my object detection model confidence tensor goes right but during conversion of coordinates from dist2bbox something goes wrong.
    image
  3. I tried multi-output model, placing layers in output processing layers to different order and to change the version of tensorflow without success.
  4. It would be hugely beneficial if this feature could be added or showed to me how it can be applied.
  5. As input I trained yolov8 model with ultralytics library but replaced SiLU activations with ReLUs.
@PINTO0309 PINTO0309 added the Quantization Quantization label Oct 11, 2024
@PINTO0309
Copy link
Owner

PINTO0309 commented Oct 11, 2024

@deltacosos
Copy link
Author

deltacosos commented Oct 11, 2024

Yes I think so that it is very much related to this issue: #269. I also have normalized the coordinates to the range [0,1] like confidence values are in the same range. It is absolutely possible that I just should take them to separate outputs and not using concat which might confuse Tensorflow Quantization. I was just wondering if the output scale and zero_point could be frozen before starting quantization. Freezing of activations and weights can be done with other quantization libraries like with tfmot and AIMET but not sure with TFLiteConverter. Other possibility is to change them afterwards.

@PINTO0309
Copy link
Owner

PINTO0309 commented Oct 11, 2024

Just rewrite flatbuffer as python code with flatbuffers package.

def rewrite_tflite_inout_opname(

@deltacosos
Copy link
Author

Okey I will have a look at it and tell soon how it worked :)

Copy link

If there is no activity within the next two days, this issue will be closed automatically.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants