Intel Neural Compressor Release 3.1

Latest

Latest

chensuyue released this 25 Oct 08:18

· 111 commits to master since this release

a8cd9aa

Highlights
Features
Improvements
Validated Hardware
Validated Configurations

Highlights

Aligned with Habana 1.18 release with the improvements on FP8 and INT4 quantization for Intel® Gaudi® AI accelerator
Provided Transformer-like quantization API for weight-only quantization on LLM, which offers transformer-based user one-stop experience for quantization & inference with IPEX on Intel GPU and CPU.

Features

Add Transformer-like quantization API for weight-only quantization on LLM
Support fast quantization with light weight recipe and layer-wise approach on Intel AI PC
Support INT4 quantization of Visual Language Model (VLM), like Llava, Phi-3-vision, Qwen-VL with AutoRound algorithm

Improvements

Support AWQ format INT4 model loading and converting for IPEX inference in Transformer-like API
Enable auto-round format export for INT4 model
Support per-channel INT8 Post Training Quantization for PT2E

Validated Hardware 

Intel Gaudi Al Accelerators (Gaudi 2 and 3)
Intel Xeon Scalable processor (4th, 5th, 6th Gen)
Intel Core Ultra Processors (Series 1 and 2)
Intel Data Center GPU Max Series (1100)

Validated Configurations

Centos 8.4 & Ubuntu 22.04 & Win 11
Python 3.9, 3.10, 3.11, 3.12
PyTorch/IPEX 2.2, 2.3, 2.4

Assets 2