Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add doc #1643

Merged
merged 5 commits into from
Feb 22, 2022
Merged

add doc #1643

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions doc/Check_Env_CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Paddle Serving 环境检查功能介绍

## 概览
Paddle Serving 提供了一键运行示例,检查 Paddle Serving 环境是否安装正确。


## 启动方式
```
python3 -m paddle_serving_server.serve check
```

|命令|描述|
|---------|----|
|check_all|检查 Paddle Inference、Pipeline Serving、C++ Serving。只打印检测结果,不记录日志|
|check_pipeline|检查 Pipeline Serving,只打印检测结果,不记录日志|
|check_cpp|检查 C++ Serving,只打印检测结果,不记录日志|
|check_inference|检查 Paddle Inference 是否安装正确,只打印检测结果,不记录日志|
|debug|发生报错后,该命令将打印提示日志到屏幕,并记录详细日志文件|
|exit|退出|
>> **注意**:<br>
>> 1.当 C++ Serving 启动报错且是自己编译后 pip 安装的paddle_serving_server, 确认是否执行 `export SERVING_BIN` 导入`SERVING_BIN`真实路径。<br>
>> 2.可以通过 `export SERVING_LOG_PATH` 指定`debug`命令生成log的路径,默认是在当前路径下记录日志。
107 changes: 66 additions & 41 deletions doc/TensorRT_Dynamic_Shape_CN.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# 如何配置TensorRT动态shape
# 如何开启 TensorRT 并配置动态 shape
(简体中文|[English](./TensorRT_Dynamic_Shape_EN.md))

## 引言
## 概览

在Pipeline/C++开启TensorRT`--use_trt`后,关于如何进行动态shape的配置,以下会分别给出Pipeline Serving和C++ Serving示例
TensorRT是一个高性能的深度学习推理(Inference)优化器,可以为深度学习应用提供低延迟、高吞吐率的部署推理。
以下将分别从 Pipeline Serving 和 C++ Serving 介绍 Tensorrt 开启方式以及配置动态 shape(Dynamic Shape)。

以下是动态shape api
## Paddle Inference Dynamic Shape Api
```
void SetTRTDynamicShapeInfo(
std::map<std::string, std::vector<int>> min_input_shape,
Expand All @@ -15,7 +16,23 @@
```
具体API说明请参考[C++](https://paddleinference.paddlepaddle.org.cn/api_reference/cxx_api_doc/Config/GPUConfig.html#tensorrt)/[Python](https://paddleinference.paddlepaddle.org.cn/api_reference/python_api_doc/Config/GPUConfig.html#tensorrt)

### C++ Serving
## C++ Serving

**一. C++ Serving Tensorrt 开启方式**

在 C++ Serving 启动命令加上`--use_trt`

```
python -m paddle_serving_server.serve \
--model serving_server \
--thread 2 --port 9000 \
--gpu_ids 0 \
--use_trt \
--precision FP16
```

**二. C++ Serving 设置动态 shape**

在`**/paddle_inference/paddle/include/paddle_engine.h` 修改如下代码

```
Expand Down Expand Up @@ -111,44 +128,52 @@
```


### Pipeline Serving
## Pipeline Serving

在`**/python/paddle_serving_app/local_predict.py`中修改如下代码
**一. Pipeline Serving Tensorrt 开启方式**

在示例目录下的 config.yml 文件, 修改`device_type: 2`, 配置 GPU 使用的核心 `devices: "0,1,2,3"`
>> **注意**: Tensorrt 需要配合 GPU 使用

**二. Pipeline Serving 设置动态 shape**

在示例目录下的 web_service.py, 在每个 op 下可以通过 `def set_dynamic_shape_info(self):` 添加动态 shape 相关的配置

示例如下
```
if use_trt:
config.enable_tensorrt_engine(
precision_mode=precision_type,
workspace_size=1 << 20,
max_batch_size=32,
min_subgraph_size=3,
use_static=False,
use_calib_mode=False)
head_number = 12

names = [
"placeholder_0", "placeholder_1", "placeholder_2", "stack_0.tmp_0"
]
min_input_shape = [1, 1, 1]
max_input_shape = [100, 128, 1]
opt_input_shape = [10, 60, 1]

config.set_trt_dynamic_shape_info(
{
names[0]: min_input_shape,
names[1]: min_input_shape,
names[2]: min_input_shape,
names[3]: [1, head_number, 1, 1]
}, {
names[0]: max_input_shape,
names[1]: max_input_shape,
names[2]: max_input_shape,
names[3]: [100, head_number, 128, 128]
}, {
names[0]: opt_input_shape,
names[1]: opt_input_shape,
names[2]: opt_input_shape,
names[3]: [10, head_number, 60, 60]
})
def set_dynamic_shape_info(self):
min_input_shape = {
"x": [1, 3, 50, 50],
"conv2d_182.tmp_0": [1, 1, 20, 20],
"nearest_interp_v2_2.tmp_0": [1, 1, 20, 20],
"nearest_interp_v2_3.tmp_0": [1, 1, 20, 20],
"nearest_interp_v2_4.tmp_0": [1, 1, 20, 20],
"nearest_interp_v2_5.tmp_0": [1, 1, 20, 20]
}
max_input_shape = {
"x": [1, 3, 1536, 1536],
"conv2d_182.tmp_0": [20, 200, 960, 960],
"nearest_interp_v2_2.tmp_0": [20, 200, 960, 960],
"nearest_interp_v2_3.tmp_0": [20, 200, 960, 960],
"nearest_interp_v2_4.tmp_0": [20, 200, 960, 960],
"nearest_interp_v2_5.tmp_0": [20, 200, 960, 960],
}
opt_input_shape = {
"x": [1, 3, 960, 960],
"conv2d_182.tmp_0": [3, 96, 240, 240],
"nearest_interp_v2_2.tmp_0": [3, 96, 240, 240],
"nearest_interp_v2_3.tmp_0": [3, 24, 240, 240],
"nearest_interp_v2_4.tmp_0": [3, 24, 240, 240],
"nearest_interp_v2_5.tmp_0": [3, 24, 240, 240],
}
self.dynamic_shape_info = {
"min_input_shape": min_input_shape,
"max_input_shape": max_input_shape,
"opt_input_shape": opt_input_shape,
}

```
具体可以参考[Pipeline OCR](../examples/Pipeline/PaddleOCR/ocr/)
>> **注意**: 由于不同的模型具有不同的动态 shape 配置,因此不存在通用的动态 shape 配置方法。当运行 Pipeline Serving
>> 出现报错信息时,应该使用[netron](https://netron.app/) 加载模型,查看各个 op 的输入输出 shape。之后,结合报错信息,在 web_service.py
>> 添加相应的动态 shape 配置代码。