open-mmlab · lvhan028 · Aug 15, 2022 · Jul 28, 2022 · Jul 29, 2022 · Aug 1, 2022
diff --git a/README.md b/README.md
@@ -63,9 +63,9 @@ Models can be exported and run in the following backends, and more will be compa
 
 All kinds of modules in the SDK can be extended, such as `Transform` for image processing, `Net` for Neural Network inference, `Module` for postprocessing and so on
 
-## Get Started
+## [Documentation](https://mmdeploy.readthedocs.io/en/latest/)
 
-Please read [getting_started.md](docs/en/get_started.md) for the basic usage of MMDeploy. We also provide tutoials about:
+Please read [getting_started](docs/en/get_started.md) for the basic usage of MMDeploy. We also provide tutoials about:
 
 - [Build](docs/en/01-how-to-build/build_from_source.md)
   - [Build from Docker](docs/en/01-how-to-build/build_from_docker.md)
@@ -77,11 +77,20 @@ Please read [getting_started.md](docs/en/get_started.md) for the basic usage of
 - User Guide
   - [How to convert model](docs/en/02-how-to-run/convert_model.md)
   - [How to write config](docs/en/02-how-to-run/write_config.md)
-  - [How to evaluate deployed models](docs/en/02-how-to-run/how_to_evaluate_a_model.md)
-  - [How to measure performance of deployed models](docs/en/02-how-to-run/how_to_measure_performance_of_models.md)
+  - [How to profile model](docs/en/02-how-to-run/profile_model.md)
+  - [How to quantize model](docs/en/02-how-to-run/quantize_model.md)
+  - [Useful tools](docs/en/02-how-to-run/useful_tools.md)
 - Developer Guide
-  - [How to support new models](docs/en/06-developer-guide/support_new_model.md)
-  - [How to support new backends](docs/en/06-developer-guide/support_new_backend.md)
+  - [How to support new models](docs/en/07-developer-guide/support_new_model.md)
+  - [How to support new backends](docs/en/07-developer-guide/support_new_backend.md)
+  - [How to partition model](docs/en/07-developer-guide/partition_model.md)
+  - [How to test rewritten model](docs/en/07-developer-guide/test_rewritten_models.md)
+  - [How to test backend ops](docs/en/07-developer-guide/add_backend_ops_unittest.md)
+  - [How to do regression test](docs/en/07-developer-guide/regression_test.md)
+- Custom Backend Ops
+  - [ncnn](docs/en/06-custom-ops/ncnn.md)
+  - [onnxruntime](docs/en/06-custom-ops/onnxruntime.md)
+  - [tensorrt](docs/en/06-custom-ops/tensorrt.md)
 - [FAQ](docs/en/faq.md)
 - [Contributing](.github/CONTRIBUTING.md)
 

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -63,8 +63,9 @@ MMDeploy 是 [OpenMMLab](https://openmmlab.com/) 模型部署工具箱，**为
 - Net 推理
 - Module 后处理
 
-## [快速上手](docs/zh_cn/get_started.md)
+## [中文文档](https://mmdeploy.readthedocs.io/zh_CN/latest/)
 
+- [快速上手](docs/zh_cn/get_started.md)
 - [编译](docs/zh_cn/01-how-to-build/build_from_source.md)
   - [Build from Docker](docs/zh_cn/01-how-to-build/build_from_docker.md)
   - [Build for Linux](docs/zh_cn/01-how-to-build/linux-x86_64.md)
@@ -77,17 +78,28 @@ MMDeploy 是 [OpenMMLab](https://openmmlab.com/) 模型部署工具箱，**为
   - [配置转换参数](docs/zh_cn/02-how-to-run/write_config.md)
   - [量化](docs/zh_cn/02-how-to-run/quantize_model.md)
   - [测试转换完成的模型](docs/zh_cn/02-how-to-run/profile_model.md)
+  - [工具集介绍](docs/zh_cn/02-how-to-run/useful_tools.md)
 - 开发指南
-  - [支持新模型](docs/zh_cn/04-developer-guide/support_new_model.md)
-  - [增加推理 Backend](docs/zh_cn/04-developer-guide/support_new_backend.md)
-  - [回归测试](docs/zh_cn/04-developer-guide/do_regression_test.md)
+  - [支持新模型](docs/zh_cn/07-developer-guide/support_new_model.md)
+  - [增加推理 backend](docs/zh_cn/07-developer-guide/support_new_backend.md)
+  - [模型分块](docs/zh_cn/07-developer-guide/partition_model.md)
+  - [测试重写模型](docs/zh_cn/07-developer-guide/test_rewritten_models.md)
+  - [backend 算子测试](docs/zh_cn/07-developer-guide/add_backend_ops_unittest.md)
+  - [回归测试](docs/zh_cn/07-developer-guide/regression_test.md)
+- 各 backend 自定义算子列表
+  - [ncnn](docs/zh_cn/06-custom-ops/ncnn.md)
+  - [onnxruntime](docs/zh_cn/06-custom-ops/onnxruntime.md)
+  - [tensorrt](docs/zh_cn/06-custom-ops/tensorrt.md)
 - [FAQ](docs/zh_cn/faq.md)
 - [贡献者手册](.github/CONTRIBUTING.md)
 
 ## 新人解说
 
-- [01 术语解释、加载第一个模型](docs/zh_cn/05-tutorial/01_introduction_to_model_deployment.md)
-- [02 转成 onnx](docs/zh_cn/05-tutorial/02_challenges.md)
+- [01 术语解释、加载第一个模型](docs/zh_cn/tutorial/01_introduction_to_model_deployment.md)
+- [02 部署常见问题](docs/zh_cn/tutorial/02_challenges.md)
+- [03 torch转onnx](docs/zh_cn/tutorial/03_pytorch2onnx.md)
+- [04 让torch支持更多onnx算子](docs/zh_cn/tutorial/04_onnx_custom_op.md)
+- [05 调试onnx模型](docs/zh_cn/tutorial/05_onnx_model_editing.md)
 
 ## 基准与模型库
 

diff --git a/docs/en/01-how-to-build/jetsons.md b/docs/en/01-how-to-build/jetsons.md
@@ -229,7 +229,7 @@ export MMDEPLOY_DIR=$(pwd)
 ### Install Model Converter
 
 Since some operators adopted by OpenMMLab codebases are not supported by TensorRT, we build the custom TensorRT plugins to make it up, such as `roi_align`, `scatternd`, etc.
-You can find a full list of custom plugins from [here](../ops/tensorrt.md).
+You can find a full list of custom plugins from [here](../06-custom-ops/tensorrt.md).
 
 ```shell
 # build TensorRT custom operators

diff --git a/docs/en/02-how-to-run/convert_model.md b/docs/en/02-how-to-run/convert_model.md
@@ -65,7 +65,7 @@ python ./tools/deploy.py \
 
 ## How to evaluate the exported models
 
-You can try to evaluate model, referring to [how_to_evaluate_a_model](./how_to_evaluate_a_model.md).
+You can try to evaluate model, referring to [how_to_evaluate_a_model](./profile_model.md).
 
 ## List of supported models exportable to other backends
 

diff --git a/docs/en/02-how-to-run/how_to_measure_performance_of_models.md b/docs/en/02-how-to-run/how_to_measure_performance_of_models.md
diff --git a/.../02-how-to-run/how_to_evaluate_a_model.md → docs/en/02-how-to-run/profile_model.md b/.../02-how-to-run/how_to_evaluate_a_model.md → docs/en/02-how-to-run/profile_model.md
@@ -25,6 +25,9 @@ ${MODEL_CFG} \
 [--metric-options ${METRIC_OPTIONS}]
 [--log2file work_dirs/output.txt]
 [--batch-size ${BATCH_SIZE}]
+[--speed-test] \
+[--warmup ${WARM_UP}] \
+[--log-interval ${LOG_INTERVERL}] \
 ```
 
 ## Description of all arguments
@@ -44,6 +47,9 @@ ${MODEL_CFG} \
   format will be kwargs for dataset.evaluate() function.
 - `--log2file`: log evaluation results (and speed) to file.
 - `--batch-size`: the batch size for inference, which would override `samples_per_gpu` in data config. Default is `1`. Note that not all models support `batch_size>1`.
+- `--speed-test`:  Whether to activate speed test.
+- `--warmup`: warmup before counting inference elapse, require setting speed-test first.
+- `--log-interval`: The interval between each log, require setting speed-test first.
 
 \* Other arguments in `tools/test.py` are used for speed test. They have no concern with evaluation.
 
@@ -55,7 +61,8 @@ python tools/test.py \
     {MMCLS_DIR}/configs/resnet/resnet50_b32x8_imagenet.py \
     --model model.onnx \
     --out out.pkl \
-    --device cuda:0
+    --device cpu \
+    --speed-test
 ```
 
 ## Note

diff --git a/docs/en/02-how-to-run/quantize_model.md b/docs/en/02-how-to-run/quantize_model.md
@@ -0,0 +1,67 @@
+# Quantize model
+
+## Why quantization ?
+
+The fixed-point model has many advantages over the fp32 model:
+
+- Smaller size, 8-bit model reduces file size by 75%
+- Benefit from the smaller model, the Cache hit rate is improved and inference would be faster
+- Chips tend to have corresponding fixed-point acceleration instructions which are faster and less energy consumed (int8 on a common CPU requires only about 10% of energy)
+
+The size of the installation package and the heat generation are the key indicators of the mobile terminal evaluation APP;
+On the server side, quantization means that you can maintain the same QPS and improve model precision in exchange for improved accuracy.
+
+## Post training quantization scheme
+
+Taking ncnn backend as an example, the complete workflow is as follows:
+
+<div align="center">
+  <img src="../_static/image/quant_model.png"/>
+</div>
+
+mmdeploy generates quantization table based on static graph (onnx) and uses backend tools to convert fp32 model to fixed point.
+
+Currently mmdeploy support ncnn with PTQ.
+
+## How to convert model
+
+[After mmdeploy installation](../01-how-to-build/build_from_source.md), install ppq
+
+```bash
+git clone https://github.com/openppl-public/ppq.git
+cd ppq
+git checkout edbecf4 # import some feature
+pip install -r requirements.txt
+python3 setup.py install
+```
+
+Back in mmdeploy, enable quantization with the option 'tools/deploy.py --quant'.
+
+```bash
+cd /path/to/mmdeploy
+export MODEL_PATH=/path/to/mmclassification/configs/resnet/resnet18_8xb16_cifar10.py
+export MODEL_CONFIG=https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_b16x8_cifar10_20210528-bd6371c8.pth
+
+python3 tools/deploy.py  configs/mmcls/classification_ncnn-int8_static.py  ${MODEL_CONFIG}  ${MODEL_PATH}   /path/to/self-test.png   --work-dir work_dir --device cpu --quant --quant-image-dir /path/to/images
+...
+```
+
+Description
+
+|     Parameter     |                             Meaning                              |
+| :---------------: | :--------------------------------------------------------------: |
+|      --quant      |         Enable quantization, the default value is False          |
+| --quant-image-dir | Calibrate dataset, use Validation Set in MODEL_CONFIG by default |
+
+## Custom calibration dataset
+
+Calibration set is used to calculate quantization layer parameters. Some DFQ (Data Free Quantization) methods do not even require a dataset.
+
+- Create a new folder, just put in the picture (no directory structure required, no negative example required, no filename format required)
+- The image needs to be the data comes from real scenario otherwise the accuracy would be drop
+- You can not quantize model with test dataset
+  | Type  | Train dataset | Validation dataset | Test dataset  | Calibration dataset |
+  | ----- | ------------- | ------------------ | ------------- | ------------------- |
+  | Usage | QAT           | PTQ                | Test accuracy | PTQ                 |
+
+It is highly recommended that \[verifying model precision\] (./profile_model.md) after quantization. \[Here\] (.. /03-benchmark/quantization.md) is some quantization model test result.
diff --git a/docs/en/useful_tools.md → docs/en/02-how-to-run/useful_tools.md b/docs/en/useful_tools.md → docs/en/02-how-to-run/useful_tools.md
@@ -1,3 +1,5 @@
+# Useful Tools
+
 Apart from `deploy.py`, there are other useful tools under the `tools/` directory.
 
 ## torch2onnx
@@ -96,7 +98,8 @@ python tools/onnx2tensorrt.py \
     ${ONNX_PATH} \
     ${OUTPUT} \
     --device-id 0 \
-    --log-level INFO
+    --log-level INFO \
+    --calib-file /path/to/file
 ```
 
 ### Description of all arguments

diff --git a/docs/en/03-benchmark/benchmark.md b/docs/en/03-benchmark/benchmark.md
@@ -26,7 +26,7 @@ GPU: ncnn, TensorRT, PPLNN
 - Warm up. For ncnn, we warm up 30 iters for all codebases. As for other backends: for classification, we warm up 1010 iters; for other codebases, we warm up 10 iters.
 - Input resolution varies for different datasets of different codebases. All inputs are real images except for `mmediting` because the dataset is not large enough.
 
-Users can directly test the speed through [model profiling](../02-how-to-run/how_to_measure_performance_of_models.md). And here is the benchmark in our environment.
+Users can directly test the speed through [model profiling](../02-how-to-run/profile_model.md). And here is the benchmark in our environment.
 
 <div style="margin-left: 25px;">
 <table class="docutils">
@@ -407,7 +407,7 @@ Users can directly test the speed through [model profiling](../02-how-to-run/how
 
 ## Performance benchmark
 
-Users can directly test the performance through [how_to_evaluate_a_model.md](../02-how-to-run/how_to_evaluate_a_model.md). And here is the benchmark in our environment.
+Users can directly test the performance through [how_to_evaluate_a_model.md](../02-how-to-run/profile_model.md). And here is the benchmark in our environment.
 
 <div style="margin-left: 25px;">
 <table class="docutils">

diff --git a/docs/en/03-benchmark/benchmark_edge.md b/docs/en/03-benchmark/benchmark_edge.md
@@ -1,6 +1,6 @@
 # Test on embedded device
 
-Here are the test conclusions of our edge devices. You can directly obtain the results of your own environment with [model profiling](../02-how-to-run/how_to_evaluate_a_model.md).
+Here are the test conclusions of our edge devices. You can directly obtain the results of your own environment with [model profiling](../02-how-to-run/profile_model.md).
 
 ## Software and hardware environment
 

diff --git a/docs/en/03-benchmark/quantization.md b/docs/en/03-benchmark/quantization.md
@@ -0,0 +1,27 @@
+# Quantization test result
+
+Currently mmdeploy support ncnn quantization
+
+## Quantize with ncnn
+
+### mmcls
+
+|                                                            model                                                             |   dataset   | fp32 top-1 (%) | int8 top-1 (%) |
+| :--------------------------------------------------------------------------------------------------------------------------: | :---------: | :------------: | :------------: |
+|       [ResNet-18](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet18_8xb16_cifar10.py)       |   Cifar10   |     94.82      |     94.83      |
+| [ResNeXt-32x4d-50](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnext/resnext50-32x4d_8xb32_in1k.py) | ImageNet-1k |     77.90      |    78.20\*     |
+|  [MobileNet V2](https://github.com/open-mmlab/mmclassification/blob/master/configs/mobilenet_v2/mobilenet-v2_8xb32_in1k.py)  | ImageNet-1k |     71.86      |    71.43\*     |
+|       [HRNet-W18\*](https://github.com/open-mmlab/mmclassification/blob/master/configs/hrnet/hrnet-w18_4xb32_in1k.py)        | ImageNet-1k |     76.75      |    76.25\*     |
+
+Note:
+
+- Because of the large amount of imagenet-1k data and ncnn has not released Vulkan int8 version, only part of the test set (4000/50000) is used.
+- The accuracy will vary after quantization, and it is normal for the classification model to increase by less than 1%.
+
+### OCR detection
+
+|                                                       model                                                       |  dataset  | fp32 hmean |   int8 hmean   |
+| :---------------------------------------------------------------------------------------------------------------: | :-------: | :--------: | :------------: |
+| [PANet](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) | ICDAR2015 |   0.795    | 0.792 @thr=0.9 |
+
+Note: \[mmocr\] (https://github.com/open-mmlab/mmocr) Uses 'shapely' to compute IoU, which results in a slight difference in accuracy
diff --git a/docs/en/04-supported-codebases/mmocr.md b/docs/en/04-supported-codebases/mmocr.md
@@ -21,7 +21,7 @@ Please refer to [install.md](https://mmocr.readthedocs.io/en/latest/install.html
 
 Note that ncnn, pplnn, and OpenVINO only support the configs of DBNet18 for DBNet.
 
-For the PANet with the [checkpoint](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth) pretrained on ICDAR dateset, if you want to convert the model to TensorRT with 16 bits float point, please try the following script.
+For the PANet with the [checkpoint](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth) pretrained on ICDAR dataset, if you want to convert the model to TensorRT with 16 bits float point, please try the following script.
 
 ```python
 # Copyright (c) OpenMMLab. All rights reserved.