Skip to content

Commit

Permalink
Merge branch 'master' into kaihui/ar_v0.3
Browse files Browse the repository at this point in the history
  • Loading branch information
chensuyue authored Jul 22, 2024
2 parents a1feaf5 + e106dea commit 967b780
Show file tree
Hide file tree
Showing 43 changed files with 2,365 additions and 350 deletions.
6 changes: 6 additions & 0 deletions .azure-pipelines/scripts/codeScan/pydocstyle/scan_path.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,9 @@
/neural-compressor/neural_compressor/strategy
/neural-compressor/neural_compressor/training.py
/neural-compressor/neural_compressor/utils
/neural-compressor/neural_compressor/torch/algorithms/static_quant
/neural-compressor/neural_compressor/torch/algorithms/smooth_quant
/neural_compressor/torch/algorithms/pt2e_quant
/neural_compressor/torch/export
/neural_compressor/common
/neural_compressor/torch/algorithms/weight_only/hqq
30 changes: 19 additions & 11 deletions docs/3x/PT_MixedPrecision.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,17 @@ PyTorch Mixed Precision

## Introduction

The recent growth of Deep Learning has driven the development of more complex models that require significantly more compute and memory capabilities. Several low precision numeric formats have been proposed to address the problem. Google's [bfloat16](https://cloud.google.com/tpu/docs/bfloat16) and the [FP16: IEEE](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) half-precision format are two of the most widely used sixteen bit formats. [Mixed precision](https://arxiv.org/abs/1710.03740) training and inference using low precision formats have been developed to reduce compute and bandwidth requirements.
The recent growth of Deep Learning has driven the development of more complex models that require significantly more compute and memory capabilities. Several low precision numeric formats have been proposed to address the problem.
Google's [bfloat16](https://cloud.google.com/tpu/docs/bfloat16) and the [FP16: IEEE](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) half-precision format are two of the most widely used sixteen bit formats. [Mixed precision](https://arxiv.org/abs/1710.03740) training and inference using low precision formats have been developed to reduce compute and bandwidth requirements.

The 3rd Gen Intel® Xeon® Scalable processor (codenamed Cooper Lake), featuring Intel® Deep Learning Boost, is the first general-purpose x86 CPU to support the bfloat16 format. Specifically, three new bfloat16 instructions are added as a part of the AVX512_BF16 extension within Intel Deep Learning Boost: VCVTNE2PS2BF16, VCVTNEPS2BF16, and VDPBF16PS. The first two instructions allow converting to and from bfloat16 data type, while the last one performs a dot product of bfloat16 pairs. Further details can be found in the [hardware numerics document](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-deep-learning-boost-new-instruction-bfloat16.html) published by Intel.
The 3rd Gen Intel® Xeon® Scalable processor (codenamed Cooper Lake), featuring Intel® Deep Learning Boost, is the first general-purpose x86 CPU to support the bfloat16 format. Specifically, three new bfloat16 instructions are added as a part of the AVX512_BF16 extension within Intel Deep Learning Boost: VCVTNE2PS2BF16, VCVTNEPS2BF16, and VDPBF16PS. The first two instructions allow converting to and from bfloat16 data type, while the last one performs a dot product of bfloat16 pairs.
Further details can be found in the [Hardware Numerics Document](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-deep-learning-boost-new-instruction-bfloat16.html) published by Intel.

The 4th Gen Intel® Xeon® Scalable processor supports FP16 instruction set architecture (ISA) for Intel®
Advanced Vector Extensions 512 (Intel® AVX-512). The new ISA supports a wide range of general-purpose numeric
operations for 16-bit half-precision IEEE-754 floating-point and complements the existing 32-bit and 64-bit floating-point instructions already available in the Intel Xeon processor based products. Further details can be found in the [hardware numerics document](https://www.intel.com/content/www/us/en/content-details/669773/intel-avx-512-fp16-instruction-set-for-intel-xeon-processor-based-products-technology-guide.html) published by Intel.
The 4th Gen Intel® Xeon® Scalable processor supports FP16 instruction set architecture (ISA) for Intel® Advanced Vector Extensions 512 (Intel® AVX-512). The new ISA supports a wide range of general-purpose numeric operations for 16-bit half-precision IEEE-754 floating-point and complements the existing 32-bit and 64-bit floating-point instructions already available in the Intel Xeon processor based products.
Further details can be found in the [Intel AVX512 FP16 Guide](https://www.intel.com/content/www/us/en/content-details/669773/intel-avx-512-fp16-instruction-set-for-intel-xeon-processor-based-products-technology-guide.html) published by Intel.

The latest Intel Xeon processors deliver flexibility of Intel Advanced Matrix Extensions (Intel AMX) ,an accelerator that improves the performance of deep learning(DL) training and inference, making it ideal for workloads like NLP, recommender systems, and image recognition. Developers can code AI functionality to take advantage of the Intel AMX instruction set, and they can code non-AI functionality to use the processor instruction set architecture (ISA). Intel has integrated the Intel® oneAPI Deep Neural Network Library (oneDNN), its oneAPI DL engine, into Pytorch.
Further details can be found in the [Intel AMX Document](https://www.intel.com/content/www/us/en/content-details/785250/accelerate-artificial-intelligence-ai-workloads-with-intel-advanced-matrix-extensions-intel-amx.html) published by Intel.

<p align="center" width="100%">
<img src="./imgs/data_format.png" alt="Architecture" height=230>
Expand Down Expand Up @@ -58,6 +62,9 @@ operations for 16-bit half-precision IEEE-754 floating-point and complements the
- PyTorch
1. Hardware: CPU supports `avx512_fp16` instruction set.
2. Software: torch >= [1.11.0](https://download.pytorch.org/whl/torch_stable.html).
> Note: To run FP16 on Intel-AMX, please set the environment variable `ONEDNN_MAX_CPU_ISA`:
> ```export ONEDNN_MAX_CPU_ISA=AVX512_CORE_AMX_FP16```


### Accuracy-driven mixed precision
Expand All @@ -68,36 +75,37 @@ To be noticed, IPEX backend doesn't support accuracy-driven mixed precision.

## Get Started with autotune API

To get a bf16/fp16 model, users can use the `autotune` interface with `MixPrecisionConfig` as follows.
To get a bf16/fp16 model, users can use the `autotune` interface with `MixedPrecisionConfig` as follows.

- BF16:

```python
from neural_compressor.torch.quantization import MixPrecisionConfig, TuningConfig, autotune
from neural_compressor.torch.quantization import MixedPrecisionConfig, TuningConfig, autotune

def eval_acc_fn(model):
......
return acc

# modules might be fallback to fp32 to get better accuracy
custom_tune_config = TuningConfig(config_set=[MixPrecisionConfig(dtype=["bf16", "fp32"])], max_trials=3)
custom_tune_config = TuningConfig(config_set=[MixedPrecisionConfig(dtype=["bf16", "fp32"])], max_trials=3)
best_model = autotune(model=build_torch_model(), tune_config=custom_tune_config, eval_fn=eval_acc_fn)
```

- FP16:

```python
from neural_compressor.torch.quantization import MixPrecisionConfig, TuningConfig, autotune
from neural_compressor.torch.quantization import MixedPrecisionConfig, TuningConfig, autotune

def eval_acc_fn(model):
......
return acc

# modules might be fallback to fp32 to get better accuracy
custom_tune_config = TuningConfig(config_set=[MixPrecisionConfig(dtype=["fp16", "fp32"])], max_trials=3)
custom_tune_config = TuningConfig(config_set=[MixedPrecisionConfig(dtype=["fp16", "fp32"])], max_trials=3)
best_model = autotune(model=build_torch_model(), tune_config=custom_tune_config, eval_fn=eval_acc_fn)
```

## Examples

Example will be added later.
Users can also refer to [examples](https://github.com/intel/neural-compressor/blob/master/examples/3.x_api/pytorch\cv\mixed_precision
) on how to quantize a model with Mixed Precision.
7 changes: 7 additions & 0 deletions examples/.config/model_params_pytorch_3x.json
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,13 @@
"input_model": "",
"main_script": "run_clm_no_trainer.py",
"batch_size": 1
},
"resnet18_mixed_precision": {
"model_src_dir": "cv/mixed_precision",
"dataset_location": "/tf_dataset/pytorch/ImageNet/raw",
"input_model": "resnet18",
"main_script": "main.py",
"batch_size": 100
}
}
}
47 changes: 47 additions & 0 deletions examples/3.x_api/pytorch/cv/mixed_precision/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
Step-by-Step
============

This document describes the step-by-step instructions for reproducing PyTorch ResNet18 MixedPrecision results with Intel® Neural Compressor.

# Prerequisite

### 1. Environment

PyTorch 1.8 or higher version is needed with pytorch_fx backend.

```Shell
cd examples/3.x_api/pytorch/image_recognition/torchvision_models/mixed_precision/resnet18
pip install -r requirements.txt
```
> Note: Validated PyTorch [Version](/docs/source/installation_guide.md#validated-software-environment).
### 2. Prepare Dataset

Download [ImageNet](http://www.image-net.org/) Raw image to dir: /path/to/imagenet. The dir includes below folder:

```bash
ls /path/to/imagenet
train val
```

# Run

> Note: All torchvision model names can be passed as long as they are included in `torchvision.models`, below are some examples.
## MixedPrecision
```Shell
bash run_autotune.sh --input_model=resnet18 --dataset_location=/path/to/imagenet
```

## Benchmark
```Shell
# run optimized performance
bash run_benchmark.sh --input_model=resnet18 --dataset_location=/path/to/imagenet --mode=performance --batch_size=100 --optimized=true --iters=500
# run optimized accuracy
bash run_benchmark.sh --input_model=resnet18 --dataset_location=/path/to/imagenet --mode=accuracy --batch_size=1 --optimized=true
```





Loading

0 comments on commit 967b780

Please sign in to comment.