Skip to content

Intel® Low Precision Optimization Tool, targeting to provide a unified low precision inference interface cross different deep learning frameworks, and support auto-tune with specified accuracy criterion to find out best quantized model.

License

Notifications You must be signed in to change notification settings

fengyisun/lpot

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction to Intel® LPOT

The Intel® Low Precision Optimization Tool (Intel® LPOT) is an open-source Python library that delivers a unified low-precision inference interface across multiple Intel-optimized Deep Learning (DL) frameworks on both CPUs and GPUs. It supports automatic accuracy-driven tuning strategies, along with additional objectives such as optimizing for performance, model size, and memory footprint. It also provides easy extension capability for new backends, tuning strategies, metrics, and objectives.

Note

GPU support is under development.

Visit the Intel® LPOT online document website at: https://intel.github.io/lpot.

Architecture

Intel® LPOT features an infrastructure and workflow that aids in increasing performance and faster deployments across architectures.

Infrastructure

Infrastructure

Click the image to enlarge it.

Workflow

Workflow

Click the image to enlarge it.

Supported Frameworks

Supported Intel-optimized DL frameworks are:

Note: Intel Optimized TensorFlow 2.5.0 requires to set environment variable TF_ENABLE_MKL_NATIVE_FORMAT=0 before running LPOT quantization or deploying the quantized model.

Note: From Official TensorFlow 2.6.0, oneDNN support has been upstreamed. User just need download official TensorFlow binary for CPU device and set environment variable TF_ENABLE_ONEDNN_OPTS=1 before running LPOT quantization or deploying the quantized model.

Installation

Select the installation based on your operating system.

Linux Installation

You can install LPOT using one of three options: Install just the LPOT library from binary or source, or get the Intel-optimized framework together with the LPOT library by installing the Intel® oneAPI AI Analytics Toolkit.

Option 1 Install from binary

# install stable version from pip
pip install lpot

# install nightly version from pip
pip install -i https://test.pypi.org/simple/ lpot

# install stable version from from conda
conda install lpot -c conda-forge -c intel 

Option 2 Install from source

git clone https://github.com/intel/lpot.git
cd lpot
pip install -r requirements.txt
python setup.py install

Option 3 Install from AI Kit

The Intel® LPOT library is released as part of the Intel® oneAPI AI Analytics Toolkit (AI Kit). The AI Kit provides a consolidated package of Intel's latest deep learning and machine optimizations all in one place for ease of development. Along with LPOT, the AI Kit includes Intel-optimized versions of deep learning frameworks (such as TensorFlow and PyTorch) and high-performing Python libraries to streamline end-to-end data science and AI workflows on Intel architectures.

The AI Kit is distributed through many common channels, including from Intel's website, YUM, APT, Anaconda, and more. Select and download the AI Kit distribution package that's best suited for you and follow the Get Started Guide for post-installation instructions.

Download AI Kit AI Kit Get Started Guide

Windows Installation

Prerequisites

The following prerequisites and requirements must be satisfied for a successful installation:

  • Python version: 3.6 or 3.7 or 3.8 or 3.9

  • Download and install anaconda.

  • Create a virtual environment named lpot in anaconda:

    # Here we install python 3.7 for instance. You can also choose python 3.6, 3.8, or 3.9.
    conda create -n lpot python=3.7
    conda activate lpot

Installation options

Option 1 Install from binary

# install stable version from pip
pip install lpot

# install nightly version from pip
pip install -i https://test.pypi.org/simple/ lpot

# install from conda
conda install lpot -c conda-forge -c intel 

Option 2 Install from source

git clone https://github.com/intel/lpot.git
cd lpot
pip install -r requirements.txt
python setup.py install

Documentation

Get Started

  • APIs explains Intel® Low Precision Optimization Tool's API.
  • Transform introduces how to utilize LPOT's built-in data processing and how to develop a custom data processing method.
  • Dataset introduces how to utilize LPOT's built-in dataset and how to develop a custom dataset.
  • Metric introduces how to utilize LPOT's built-in metrics and how to develop a custom metric.
  • Tutorial provides comprehensive instructions on how to utilize LPOT's features with examples.
  • Examples are provided to demonstrate the usage of LPOT in different frameworks: TensorFlow, PyTorch, MXNet, and ONNX Runtime.
  • UX is a web-based system used to simplify LPOT usage.
  • Intel oneAPI AI Analytics Toolkit Get Started Guide explains the AI Kit components, installation and configuration guides, and instructions for building and running sample apps.
  • AI and Analytics Samples includes code samples for Intel oneAPI libraries.

Deep Dive

  • Quantization are processes that enable inference and training by performing computations at low-precision data types, such as fixed-point integers. LPOT supports Post-Training Quantization (PTQ) with different quantization capabilities and Quantization-Aware Training (QAT). Note that (Dynamic Quantization) currently has limited support.
  • Pruning provides a common method for introducing sparsity in weights and activations.
  • Benchmarking introduces how to utilize the benchmark interface of LPOT.
  • Mixed precision introduces how to enable mixed precision, including BFP16 and int8 and FP32, on Intel platforms during tuning.
  • Graph Optimization introduces how to enable graph optimization for FP32 and auto-mixed precision.
  • Model Conversion introduces how to convert TensorFlow QAT model to quantized model running on Intel platforms.
  • TensorBoard provides tensor histograms and execution graphs for tuning debugging purposes.

Advanced Topics

  • Adaptor is the interface between LPOT and framework. The method to develop adaptor extension is introduced with ONNX Runtime as example.
  • Strategy can automatically optimized low-precision recipes for deep learning models to achieve optimal product objectives like inference performance and memory usage with expected accuracy criteria. The method to develop a new strategy is introduced.

Publications

Full publication list please refers to here

System Requirements

Intel® Low Precision Optimization Tool supports systems based on Intel 64 architecture or compatible processors, specially optimized for the following CPUs:

  • Intel Xeon Scalable processor (formerly Skylake, Cascade Lake, Cooper Lake, and Icelake)
  • future Intel Xeon Scalable processor (code name Sapphire Rapids)

Intel® Low Precision Optimization Tool requires installing the pertinent Intel-optimized framework version for TensorFlow, PyTorch, MXNet, and ONNX runtime.

Validated Hardware/Software Environment

Platform OS Python Framework Version
Cascade Lake

Cooper Lake

Skylake

Ice Lake
CentOS 8.3

Ubuntu 18.04
3.6

3.7

3.8

3.9
TensorFlow 2.6.0
2.5.0
2.4.0
2.3.0
2.2.0
2.1.0
1.15.0 UP1
1.15.0 UP2
1.15.0 UP3
1.15.2
PyTorch 1.5.0+cpu
1.6.0+cpu
1.8.0+cpu
IPEX
MXNet 1.8.0
1.7.0
1.6.0
ONNX Runtime 1.6.0
1.7.0
1.8.0

Validated Models

Intel® Low Precision Optimization Tool provides numerous examples to show promising accuracy loss with the best performance gain. A full quantized model list on various frameworks is available in the Model List.

Validated MLPerf Models

Model Framework Support Example
ResNet50 v1.5 TensorFlow Yes Link
PyTorch Yes Link
DLRM PyTorch Yes Link
BERT-large TensorFlow Yes Link
PyTorch Yes Link
SSD-ResNet34 TensorFlow WIP
PyTorch Yes Link
RNN-T PyTorch WIP
3D-UNet TensorFlow WIP
PyTorch Yes Link

Validated Quantized Models

Framework Version Model Accuracy Performance speed up
INT8 Tuning Accuracy FP32 Accuracy Baseline Acc Ratio [(INT8-FP32)/FP32] Realtime Latency Ratio[FP32/INT8]
tensorflow 2.5.0 resnet50v1.0 74.24% 74.27% -0.04% 2.67x
tensorflow 2.5.0 resnet50v1.5 76.94% 76.46% 0.63% 2.54x
tensorflow 2.5.0 resnet101 77.21% 76.45% 0.99% 2.46x
tensorflow 2.5.0 inception_v1 70.30% 69.74% 0.80% 1.63x
tensorflow 2.5.0 inception_v2 74.27% 73.97% 0.41% 1.74x
tensorflow 2.5.0 inception_v3 77.29% 76.75% 0.70% 2.11x
tensorflow 2.5.0 inception_v4 80.36% 80.27% 0.11% 2.59x
tensorflow 2.5.0 inception_resnet_v2 80.42% 80.40% 0.02% 1.86x
tensorflow 2.5.0 mobilenetv1 73.93% 70.96% 4.19% 2.27x
tensorflow 2.5.0 mobilenetv2 71.96% 71.76% 0.28% 1.78x
tensorflow 2.5.0 vgg16 72.13% 70.89% 1.75% 3.86x
tensorflow 2.5.0 vgg19 72.35% 71.01% 1.89% 3.90x
Framework Version Model Accuracy Performance speed up
INT8 Tuning Accuracy FP32 Accuracy Baseline Acc Ratio [(INT8-FP32)/FP32] Realtime Latency Ratio[FP32/INT8]
pytorch 1.8.0+cpu resnet18 69.58% 69.76% -0.26% 2.13x
pytorch 1.8.0+cpu resnet50 75.87% 76.13% -0.34% 3.11x
pytorch 1.8.0+cpu resnext101_32x8d 79.09% 79.31% -0.28% 4.99x
pytorch 1.8.0+cpu bert_base_mrpc 87.92% 88.73% -0.91% 1.79x
pytorch 1.8.0+cpu bert_base_cola 58.33% 58.84% -0.87% 2.01x
pytorch 1.8.0+cpu bert_base_sts-b 88.46% 89.27% -0.91% 2.01x
pytorch 1.8.0+cpu bert_base_sst-2 91.97% 91.86% 0.12% 2.00x
pytorch 1.8.0+cpu bert_base_rte 69.68% 69.68% 0.00% 1.92x
pytorch 1.8.0+cpu bert_large_mrpc 87.60% 88.33% -0.83% 2.47x
pytorch 1.8.0+cpu bert_large_squad 92.99 93.05 -0.06% 1.85x
pytorch 1.8.0+cpu bert_large_qnli 91.12% 91.82% -0.76% 2.47x
pytorch 1.8.0+cpu bert_large_rte 71.84% 72.56% -0.99% 1.44x
pytorch 1.8.0+cpu bert_large_cola 61.97% 62.57% -0.97% 2.48x

Validated Pruning Models

Tasks FWK Model fp32 baseline gradient sensitivity with 20% sparsity +onnx dynamic quantization on pruned model
accuracy% drop% perf gain (sample/s) accuracy% drop% perf gain (sample/s)
SST-2 pytorch bert-base accuracy = 92.32 accuracy = 91.97 -0.38 1.30x accuracy = 92.20 -0.13 1.86x
QQP pytorch bert-base [accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [89.97, 86.54] [-1.24, -1.71] 1.32x [accuracy, f1] = [89.75, 86.60] [-1.48, -1.65] 1.81x
Tasks FWK Model fp32 baseline Pattern Lock on 70% Unstructured Sparsity Pattern Lock on 50% 1:2 Structured Sparsity
accuracy% drop% accuracy% drop%
MNLI pytorch bert-base [m, mm] = [84.57, 84.79] [m, mm] = [82.45, 83.27] [-2.51, -1.80] [m, mm] = [83.20, 84.11] [-1.62, -0.80]
SST-2 pytorch bert-base accuracy = 92.32 accuracy = 91.51 -0.88 accuracy = 92.20 -0.13
QQP pytorch bert-base [accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.48, 87.06] [-0.68, -1.12] [accuracy, f1] = [90.92, 87.78] [-0.20, -0.31]
QNLI pytorch bert-base accuracy = 91.54 accuracy = 90.39 -1.26 accuracy = 90.87 -0.73
QnA pytorch bert-base [em, f1] = [79.34, 87.10] [em, f1] = [77.27, 85.75] [-2.61, -1.54] [em, f1] = [78.03, 86.50] [-1.65, -0.69]
Framework Model fp32 baseline Compression dataset acc(drop)%
Pytorch resnet18 69.76 30% sparsity on magnitude ImageNet 69.47(-0.42)
Pytorch resnet18 69.76 30% sparsity on gradient sensitivity ImageNet 68.85(-1.30)
Pytorch resnet50 76.13 30% sparsity on magnitude ImageNet 76.11(-0.03)
Pytorch resnet50 76.13 30% sparsity on magnitude and post training quantization ImageNet 76.01(-0.16)
Pytorch resnet50 76.13 30% sparsity on magnitude and quantization aware training ImageNet 75.90(-0.30)

Additional Content

About

Intel® Low Precision Optimization Tool, targeting to provide a unified low precision inference interface cross different deep learning frameworks, and support auto-tune with specified accuracy criterion to find out best quantized model.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 94.0%
  • TypeScript 3.4%
  • HTML 1.8%
  • SCSS 0.8%
  • JavaScript 0.0%
  • Makefile 0.0%