Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/develop' into andreyan/scale_e…
Browse files Browse the repository at this point in the history
…stimation_pr
  • Loading branch information
andreyanufr committed Apr 25, 2024
2 parents afcf3b3 + 17a5b65 commit 140c31f
Show file tree
Hide file tree
Showing 91 changed files with 13,864 additions and 8,464 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/precommit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ jobs:
name: coverage_onnx
flags: ONNX
openvino:
runs-on: ubuntu-20.04
runs-on: ubuntu-20.04-8-cores
steps:
- uses: actions/checkout@v3
with:
Expand Down
2 changes: 1 addition & 1 deletion .mypy.ini
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[mypy]
files = nncf/common/sparsity, nncf/common/graph
files = nncf/common/sparsity, nncf/common/graph, nncf/common/accuracy_aware_training/
follow_imports = silent
strict = True

Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ install-openvino-dev: install-openvino-test install-pre-commit
pip install -r examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control/requirements.txt

test-openvino:
ONEDNN_MAX_CPU_ISA=AVX2 pytest ${COVERAGE_ARGS} tests/openvino $(DATA_ARG) --junitxml ${JUNITXML_PATH}
ONEDNN_MAX_CPU_ISA=AVX2 pytest ${COVERAGE_ARGS} -n4 -ra tests/openvino $(DATA_ARG) --junitxml ${JUNITXML_PATH}

test-install-openvino:
pytest tests/cross_fw/install -s \
Expand Down
52 changes: 52 additions & 0 deletions ReleaseNotes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,57 @@
# Release Notes

## New in Release 2.10.0

Post-training Quantization:

- Features:
- Introduced the subgraph defining functionality for the `nncf.IgnoredScope()` option.
- Introduced limited support for the batch size of more than 1. MobilenetV2 [PyTorch example](examples/post_training_quantization/torch/mobilenet_v2) was updated with batch support.
- Fixes:
- Fixed issue with the `nncf.OverflowFix` parameter absence in some scenarios.
- Aligned the list of correctable layers for the FastBiasCorrection algorithm between PyTorch, OpenVINO and ONNX backends.
- Fixed issue with the `nncf.QuantizationMode` parameters combination.
- Fixed MobilenetV2 ([PyTorch](examples/post_training_quantization/torch/mobilenet_v2), [ONNX](examples/post_training_quantization/onnx/mobilenet_v2), [OpenVINO](examples/post_training_quantization/openvino/mobilenet_v2)) examples for the Windows platform.
- (OpenVINO) Fixed [Anomaly Classification example](examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control) for the Windows platform.
- (PyTorch) Fixed bias shift magnitude calculation for fused layers.
- (OpenVINO) Fixed removing the ShapeOf graph which led to an error in the `nncf.quantize_with_accuracy_control()` method.
- Improvements:
- `OverflowFix`, `AdvancedSmoothQuantParameters` and `AdvancedBiasCorrectionParameters` were exposed into the `nncf.*` namespace.
- (OpenVINO, PyTorch) Introduced scale compression to FP16 for weights in `nncf.compress_weights()` method, regardless of model weights precision.
- (PyTorch) Modules that NNCF inserted were excluded from parameter tracing.
- (OpenVINO) Extended the list of correctable layers for the BiasCorrection algorithm.
- (ONNX) Aligned BiasCorrection algorithm behaviour with OpenVINO in specific cases.
- Tutorials:
- [Post-Training Optimization of PhotoMaker Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/photo-maker/photo-maker.ipynb)
- [Post-Training Optimization of Stable Diffusion XL Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/stable-diffusion-xl/stable-diffusion-xl.ipynb)
- [Post-Training Optimization of KerasCV Stable Diffusion Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/stable-diffusion-keras-cv/stable-diffusion-keras-cv.ipynb)
- [Post-Training Optimization of Paint By Example Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/paint-by-example/paint-by-example.ipynb)
- [Post-Training Optimization of aMUSEd Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/amused-lightweight-text-to-image/amused-lightweight-text-to-image.ipynb)
- [Post-Training Optimization of InstantID Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/instant-id/instant-id.ipynb)
- [Post-Training Optimization of LLaVA Next Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llava-next-multimodal-chatbot/llava-next-multimodal-chatbot.ipynb)
- [Post-Training Optimization of AnimateAnyone Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/animate-anyone/animate-anyone.ipynb)
- [Post-Training Optimization of YOLOv8-OBB Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/yolov8-optimization/yolov8-obb.ipynb)
- [Post-Training Optimization of LLM Agent](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-agent-langchain/llm-agent-langchain.ipynb)

Compression-aware training:

- Features:
- (PyTorch) `nncf.quantize` method now may be used as quantization initialization for Quantization-Aware Training. Added a [Resnet18-based example](examples/quantization_aware_training/torch/resnet18) with the transition from the Post-Training Quantization to a Quantization-Aware Training algorithm.
- (PyTorch) Introduced extractors for the fused Convolution, Batch-/GroupNorm, and Linear functions.
- Fixes:
- (PyTorch) Fixed `apply_args_defaults` function issue.
- (PyTorch) Fixed `dtype` handling for the compressed `torch.nn.Parameter`.
- (PyTorch) Fixed `is_shared` parameter propagation.
- Improvements:
- (PyTorch) Updated command creation behaviour to reduce the number of adapters.
- (PyTorch) Added option to insert point for models that wrapped with `replace_modules=False`.
- Deprecations/Removals:
- (PyTorch) Removed the `binarization` algorithm.
- NNCF installation via `pip install nncf[<framework>]` option is now deprecated.
- Requirements:
- Updated PyTorch (2.2.1) and CUDA (12.1) versions.
- Updated ONNX (1.16.0) and ONNXRuntime (1.17.1) versions.

## New in Release 2.9.0

Post-training Quantization:
Expand Down
10 changes: 10 additions & 0 deletions docs/compression_algorithms/post_training/Quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,3 +91,13 @@ NNCF provides the examples of Post-Training Quantization where you can find the
function: [PyTorch](../../../examples/post_training_quantization/torch/mobilenet_v2/README.md), [TensorFlow](../../../examples/post_training_quantization/tensorflow/mobilenet_v2/README.md), [ONNX](../../../examples/post_training_quantization/onnx/mobilenet_v2/README.md), and [OpenVINO](../../../examples/post_training_quantization/openvino/mobilenet_v2/README.md)

In case the Post-Training Quantization algorithm could not reach quality requirements you can fine-tune a quantized pytorch model. Example of the Quantization-Aware training pipeline for a pytorch model could be found [here](../../../examples/quantization_aware_training/torch/resnet18/README.md).

## Using `pytorch.Dataloader` or `tf.data.Dataset` as data source for calibration dataset

```batch_size``` is a parameter of a dataloader that refers to the number of samples or data points propagated through the neural network in a single pass.

NNCF allows for dataloaders with different batch sizes, but there are limitations. For models like transformers or those with unconventional tensor structures, such as the batch axis not being in the expected position, using batch sizes larger than 1 for quantization isn't supported. It happens because certain models' internal data arrangements may not align with the assumptions made during quantization, leading to inaccurate statistics calculation issues with batch sizes larger than 1.

Please keep in mind that you have to recalculate the subset size for quantization according to the batch size using the following formula: ```subset_size = subset_size_for_batch_size_1 // batch_size.```.

[Example](../../../examples/post_training_quantization/torch/mobilenet_v2/README.md) with post-training quantization for PyTorch with a dataloader having a ```batch_size``` of 128.
37 changes: 37 additions & 0 deletions examples/quantization_aware_training/torch/anomalib/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Quantization-Aware Training of STFPM PyTorch model from Anomalib

The anomaly detection domain is one of the domains in which models are used in scenarios where the cost of model error is high and accuracy cannot be sacrificed for better model performance. Quantization-Aware Training (QAT) is perfect for such cases, as it reduces quantization error without model performance degradation by training the model.

This example demonstrates how to quantize [Student-Teacher Feature Pyramid Matching (STFPM)](https://anomalib.readthedocs.io/en/latest/markdown/guides/reference/models/image/stfpm.html) PyTorch model from [Anomalib](https://github.com/openvinotoolkit/anomalib) using Quantization API from Neural Network Compression Framework (NNCF). At the first step, the model is quantized using Post-Training Quantization (PTQ) algorithm to obtain the best initialization of the quantized model. If the accuracy of the quantized model after PTQ does not meet requirements, the next step is to train the quantized model using PyTorch framework.

NNCF provides a seamless transition from Post-Training Quantization to Quantization-Aware Training without additional model preparation and transfer of magic parameters.

The example includes the following steps:

- Loading the [MVTec (capsule category)](https://www.mvtec.com/company/research/datasets/mvtec-ad) dataset (~4.9 Gb).
- (Optional) Training STFPM PyTorch model from scratch.
- Loading STFPM model pretrained on this dataset.
- Quantizing the model using NNCF Post-Training Quantization algorithm.
- Fine-tuning quantized model for one epoch to improve quantized model metrics.
- Output of the following characteristics of the quantized model:
- Accuracy drop of the quantized model (INT8) over the pre-trained model (FP32)
- Compression rate of the quantized model file size relative to the pre-trained model file size
- Performance speed up of the quantized model (INT8)

## Install requirements

At this point, it is assumed that you have already installed NNCF. You can find information on installation of NNCF [here](https://github.com/openvinotoolkit/nncf#user-content-installation).

To work with the example you should install the corresponding Python package dependencies:

```bash
pip install -r requirements.txt
```

## Run Example

It's pretty simple. The example does not require additional preparation. It will do the preparation itself, such as loading the dataset and model, etc.

```bash
python main.py
```
190 changes: 190 additions & 0 deletions examples/quantization_aware_training/torch/anomalib/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# Copyright (c) 2024 Intel Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os
import re
import subprocess
import tarfile
from copy import deepcopy
from pathlib import Path
from typing import List
from urllib.request import urlretrieve

import torch
from anomalib import TaskType
from anomalib.data import MVTec
from anomalib.data.image import mvtec
from anomalib.data.utils import download
from anomalib.deploy import ExportType
from anomalib.engine import Engine
from anomalib.models import Stfpm

import nncf

HOME_PATH = Path.home()
DATASET_PATH = HOME_PATH / ".cache/nncf/datasets/mvtec"
CHECKPOINT_PATH = HOME_PATH / ".cache/nncf/models/anomalib"
ROOT = Path(__file__).parent.resolve()
FP32_RESULTS_ROOT = ROOT / "fp32"
INT8_RESULTS_ROOT = ROOT / "int8"
CHECKPOINT_URL = "https://storage.openvinotoolkit.org/repositories/nncf/examples/torch/anomalib/stfpm_mvtec.ckpt"
USE_PRETRAINED = True


def download_and_extract(root: Path, info: download.DownloadInfo) -> None:
root.mkdir(parents=True, exist_ok=True)
downloaded_file_path = root / info.url.split("/")[-1]
print(f"Downloading the {info.name} dataset.")
with download.DownloadProgressBar(unit="B", unit_scale=True, miniters=1, desc=info.name) as progress_bar:
urlretrieve(
url=f"{info.url}",
filename=downloaded_file_path,
reporthook=progress_bar.update_to,
)
print("Checking the hash of the downloaded file.")
download.check_hash(downloaded_file_path, info.hashsum)
print(f"Extracting the {info.name} dataset.")
with tarfile.open(downloaded_file_path) as tar_file:
tar_file.extractall(root)
print("Cleaning up files.")
downloaded_file_path.unlink()


def create_dataset(root: Path) -> MVTec:
if not root.exists():
download_and_extract(root, mvtec.DOWNLOAD_INFO)
return MVTec(root)


def run_benchmark(model_path: Path, shape: List[int]) -> float:
command = f"benchmark_app -m {model_path} -d CPU -api async -t 15"
command += f' -shape "[{",".join(str(x) for x in shape)}]"'
cmd_output = subprocess.check_output(command, shell=True) # nosec
print(*str(cmd_output).split("\\n")[-9:-1], sep="\n")
match = re.search(r"Throughput\: (.+?) FPS", str(cmd_output))
return float(match.group(1))


def get_model_size(ir_path: Path, m_type: str = "Mb") -> float:
xml_size = ir_path.stat().st_size
bin_size = ir_path.with_suffix(".bin").stat().st_size
for t in ["bytes", "Kb", "Mb"]:
if m_type == t:
break
xml_size /= 1024
bin_size /= 1024
model_size = xml_size + bin_size
print(f"Model graph (xml): {xml_size:.3f} Mb")
print(f"Model weights (bin): {bin_size:.3f} Mb")
print(f"Model size: {model_size:.3f} Mb")
return model_size


def main():
###############################################################################
# Step 1: Prepare the model and dataset
print(os.linesep + "[Step 1] Prepare the model and dataset")

model = Stfpm()
datamodule = create_dataset(root=DATASET_PATH)

# Create an engine for the original model
engine = Engine(task=TaskType.SEGMENTATION, default_root_dir=FP32_RESULTS_ROOT, devices=1)
if USE_PRETRAINED:
# Load the pretrained checkpoint
CHECKPOINT_PATH.mkdir(parents=True, exist_ok=True)
ckpt_path = CHECKPOINT_PATH / "stfpm_mvtec.ckpt"
torch.hub.download_url_to_file(CHECKPOINT_URL, ckpt_path)
else:
# (Optional) Train the model from scratch
engine.fit(model=model, datamodule=datamodule)
ckpt_path = engine.trainer.checkpoint_callback.best_model_path

print("Test results for original FP32 model:")
fp32_test_results = engine.test(model=model, datamodule=datamodule, ckpt_path=ckpt_path)

###############################################################################
# Step 2: Quantize the model
print(os.linesep + "[Step 2] Quantize the model")

# Create calibration dataset
def transform_fn(data_item):
return data_item["image"]

test_loader = datamodule.test_dataloader()
calibration_dataset = nncf.Dataset(test_loader, transform_fn)

# Quantize the inference model using Post-Training Quantization
inference_model = model.model
quantized_inference_model = nncf.quantize(model=inference_model, calibration_dataset=calibration_dataset)

# Deepcopy the original model and set the quantized inference model
quantized_model = deepcopy(model)
quantized_model.model = quantized_inference_model

# Create engine for the quantized model
engine = Engine(task=TaskType.SEGMENTATION, default_root_dir=INT8_RESULTS_ROOT, max_epochs=1, devices=1)

# Validate the quantized model
print("Test results for INT8 model after PTQ:")
int8_init_test_results = engine.test(model=quantized_model, datamodule=datamodule)

###############################################################################
# Step 3: Fine tune the quantized model
print(os.linesep + "[Step 3] Fine tune the quantized model")

engine.fit(model=quantized_model, datamodule=datamodule)
print("Test results for INT8 model after QAT:")
int8_test_results = engine.test(model=quantized_model, datamodule=datamodule)

###############################################################################
# Step 4: Export models
print(os.linesep + "[Step 4] Export models")

# Export FP32 model to OpenVINO™ IR
fp32_ir_path = engine.export(model=model, export_type=ExportType.OPENVINO, export_root=FP32_RESULTS_ROOT)
print(f"Original model path: {fp32_ir_path}")
fp32_size = get_model_size(fp32_ir_path)

# Export INT8 model to OpenVINO™ IR
int8_ir_path = engine.export(model=quantized_model, export_type=ExportType.OPENVINO, export_root=INT8_RESULTS_ROOT)
print(f"Quantized model path: {int8_ir_path}")
int8_size = get_model_size(int8_ir_path)

###############################################################################
# Step 5: Run benchmarks
print(os.linesep + "[Step 5] Run benchmarks")

print("Run benchmark for FP32 model (IR)...")
fp32_fps = run_benchmark(fp32_ir_path, shape=[1, 3, 256, 256])

print("Run benchmark for INT8 model (IR)...")
int8_fps = run_benchmark(int8_ir_path, shape=[1, 3, 256, 256])

###############################################################################
# Step 6: Summary
print(os.linesep + "[Step 6] Summary")

fp32_f1score = fp32_test_results[0]["image_F1Score"]
int8_init_f1score = int8_init_test_results[0]["image_F1Score"]
int8_f1score = int8_test_results[0]["image_F1Score"]

print(f"Accuracy drop after PTQ: {fp32_f1score - int8_init_f1score:.3f}")
print(f"Accuracy drop after QAT: {fp32_f1score - int8_f1score:.3f}")
print(f"Model compression rate: {fp32_size / int8_size:.3f}")
# https://docs.openvino.ai/latest/openvino_docs_optimization_guide_dldt_optimization_guide.html
print(f"Performance speed up (throughput mode): {int8_fps / fp32_fps:.3f}")

return fp32_f1score, int8_init_f1score, int8_f1score, fp32_fps, int8_fps, fp32_size, int8_size


if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
anomalib[core,openvino]==1.0.0
Loading

0 comments on commit 140c31f

Please sign in to comment.