-
Notifications
You must be signed in to change notification settings - Fork 95
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d8e8662
commit 8610838
Showing
2 changed files
with
44 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
English | [简体中文](README.md) | ||
|
||
# PTQ INT8 Quantization | ||
|
||
This is a script for fast PTQ (Post Training Quantization) INT8 quantization using TensorRT, supporting both dynamic and static batching. | ||
|
||
## Usage | ||
|
||
First, configure the model you want to quantize in `calibration.yaml`. | ||
|
||
`calibrator.data` is the path to the data used for calibration, and `calibrator.cache` is the location to save the generated calibration files. | ||
|
||
> If you choose **dynamic batching**, ensure that the dimensions of **`batch_shape`** match **`shapes.opt`**. If you choose **static batching**, set **`dynamic`** to **`False`**, and ignore **`shapes`**. | ||
After configuring `calibration.yaml`, run the following command to perform quantization: | ||
|
||
```bash | ||
cd tools | ||
python ptq_calibration.py | ||
``` | ||
|
||
The precision and latency after PTQ quantization vary depending on the model. For maximum precision, it is recommended to use QAT quantization. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
[English](README.en.md) | 简体中文 | ||
|
||
# PTQ INT8 量化 | ||
|
||
这是一个使用 TensorRT 进行快速 PTQ(Post Training Quantization)INT8 量化的脚本,支持动态和静态 Batch。 | ||
|
||
## 使用方法 | ||
|
||
首先,在 `calibration.yaml` 中配置你要量化的模型。 | ||
|
||
`calibrator.data` 是用于校准的数据路径,而 `calibrator.cache` 则是保存生成的校准文件的位置。 | ||
|
||
> 如果你选择 **动态 Batch**,务必确保 **`batch_shape`** 的维度与 **`shapes.opt`** 一致;如果你选择 **静态 Batch**,将 **`dynamic`** 设为 **`False`**,并**忽略 `shapes`**。 | ||
配置好 `calibration.yaml` 后,运行以下命令进行量化: | ||
|
||
```bash | ||
cd tools | ||
python ptq_calibration.py | ||
``` | ||
|
||
PTQ 量化后的精度与延时因模型而异,如果追求最高精度,建议使用 QAT 量化。 |