Optimizing TensorFlow models with Neural Network Compression Framework of OpenVINO by 8-bit quantization.
This tutorial demonstrates how to use NNCF 8-bit quantization to optimize the TensorFlow model for inference with OpenVINO Toolkit. For more advanced usage refer to these examples.
To make downloading and training fast, we use a ResNet-18 model with the Imagenette dataset. Imagenette is a subset of 10 easily classified classes from the ImageNet dataset.
This tutorial consists of the following steps:
- Fine-tuning of FP32 model
- Transform the original FP32 model to INT8
- Use fine-tuning to restore the accuracy
- Export optimized and original models to Frozen Graph and then to OpenVINO
- Measure and compare the performance of the models
If you have not done so already, please follow the Installation Guide to install all required dependencies.