From 8610838c127f2393aebc27c6b8b325ab7fce957d Mon Sep 17 00:00:00 2001 From: laugh12321 Date: Tue, 23 Apr 2024 19:21:52 +0800 Subject: [PATCH] Update README --- tools/README.en.md | 22 ++++++++++++++++++++++ tools/README.md | 22 ++++++++++++++++++++++ 2 files changed, 44 insertions(+) diff --git a/tools/README.en.md b/tools/README.en.md index e69de29..fac5724 100644 --- a/tools/README.en.md +++ b/tools/README.en.md @@ -0,0 +1,22 @@ +English | [简体中文](README.md) + +# PTQ INT8 Quantization + +This is a script for fast PTQ (Post Training Quantization) INT8 quantization using TensorRT, supporting both dynamic and static batching. + +## Usage + +First, configure the model you want to quantize in `calibration.yaml`. + +`calibrator.data` is the path to the data used for calibration, and `calibrator.cache` is the location to save the generated calibration files. + +> If you choose **dynamic batching**, ensure that the dimensions of **`batch_shape`** match **`shapes.opt`**. If you choose **static batching**, set **`dynamic`** to **`False`**, and ignore **`shapes`**. + +After configuring `calibration.yaml`, run the following command to perform quantization: + +```bash +cd tools +python ptq_calibration.py +``` + +The precision and latency after PTQ quantization vary depending on the model. For maximum precision, it is recommended to use QAT quantization. \ No newline at end of file diff --git a/tools/README.md b/tools/README.md index e69de29..d4b4041 100644 --- a/tools/README.md +++ b/tools/README.md @@ -0,0 +1,22 @@ +[English](README.en.md) | 简体中文 + +# PTQ INT8 量化 + +这是一个使用 TensorRT 进行快速 PTQ(Post Training Quantization)INT8 量化的脚本,支持动态和静态 Batch。 + +## 使用方法 + +首先,在 `calibration.yaml` 中配置你要量化的模型。 + +`calibrator.data` 是用于校准的数据路径,而 `calibrator.cache` 则是保存生成的校准文件的位置。 + +> 如果你选择 **动态 Batch**,务必确保 **`batch_shape`** 的维度与 **`shapes.opt`** 一致;如果你选择 **静态 Batch**,将 **`dynamic`** 设为 **`False`**,并**忽略 `shapes`**。 + +配置好 `calibration.yaml` 后,运行以下命令进行量化: + +```bash +cd tools +python ptq_calibration.py +``` + +PTQ 量化后的精度与延时因模型而异,如果追求最高精度,建议使用 QAT 量化。 \ No newline at end of file