From 2107c78cb48372e60e3f39cb2d5d2ebbbc7463ed Mon Sep 17 00:00:00 2001
From: MRXLT <xlt2024@gmail.com>
Date: Tue, 7 Apr 2020 12:47:34 +0800
Subject: [PATCH 1/9] add doc

---
 doc/MULTI_SERVICE_ON_ONE_GPU_CN.md | 14 ++++++++++++++
 doc/PERFORMANCE_OPTIM_CN.md        | 13 +++++++++++++
 2 files changed, 27 insertions(+)
 create mode 100644 doc/MULTI_SERVICE_ON_ONE_GPU_CN.md
 create mode 100644 doc/PERFORMANCE_OPTIM_CN.md

diff --git a/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md b/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md
new file mode 100644
index 000000000..fd32fd5b1
--- /dev/null
+++ b/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md
@@ -0,0 +1,14 @@
+# 单卡多模型预测服务
+
+当客户端发送的请求数并不频繁的情况下，会造成服务端机器计算资源尤其是GPU资源的浪费，这种情况下，可以在服务端启动多个预测服务来提高资源利用率。Paddle Serving支持在单张显卡上部署多个预测服务，使用时只需要在启动单个服务时通过--gpu_ids参数将服务与显卡进行绑定，这样就可以将多个服务都绑定到同一张卡上。
+
+例如：
+
+```shell
+python -m paddle_serving_server_gpu.serve --model bert_seq20_model --port 9292 --gpu_ids 0
+python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0
+```
+
+在卡0上，同时部署了bert示例和iamgenet示例。
+
+**注意：** 单张显卡内部进行推理计算时仍然为串行计算，这种方式是为了减少server端显卡的空闲时间。 
\ No newline at end of file
diff --git a/doc/PERFORMANCE_OPTIM_CN.md b/doc/PERFORMANCE_OPTIM_CN.md
new file mode 100644
index 000000000..e16dba780
--- /dev/null
+++ b/doc/PERFORMANCE_OPTIM_CN.md
@@ -0,0 +1,13 @@
+# 性能优化
+
+由于模型结构的不同，在执行预测时不同的预测对计算资源的消耗也不相同，对于在线的预测服务来说，对计算资源要求较少的模型，通信的时间成本占比就会较高，称为通信密集型服务，对计算资源要求较多的模型，推理计算的时间成本较高，称为计算密集型服务。对于这两种服务类型，可以根据实际需求采取不同的方式进行优化
+
+对于一个预测服务来说，想要判断属于哪种类型，最简单的方法就是看时间占比，Paddle Serving提供了[Timeline工具](../python/examples/util/README_CN.md)，可以直观的展现预测服务中各阶段的耗时。
+
+对于通信密集型的预测服务，可以将请求进行聚合，在对延时可以容忍的限度内，将多个预测请求合并成一个batch进行预测。
+
+对于计算密集型的预测服务，可以使用GPU预测服务代替CPU预测服务，或者增加GPU预测服务的显卡数量。
+
+在相同条件下，Paddle Serving提供的HTTP预测服务的通信时间是大于RPC预测服务的。
+
+对于模型较大，预测服务内存或显存占用较多的情况，可以通过将--mem_optim选项设置为True来开启内存/显存优化。

From 873d23ccd33cc9acac8e6737365eeb0b17dbbfbe Mon Sep 17 00:00:00 2001
From: MRXLT <xlt2024@gmail.com>
Date: Tue, 7 Apr 2020 13:05:56 +0800
Subject: [PATCH 2/9] add link

---
 README.md    | 2 ++
 README_CN.md | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/README.md b/README.md
index adc2e8d44..82dabd86e 100644
--- a/README.md
+++ b/README.md
@@ -243,6 +243,8 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv
 
 ### About Efficiency
 - [How to profile Paddle Serving latency?](python/examples/util)
+- [How to optimize performance?(Chinese)](doc/MULTI_SERVICE_ON_ONE_GPU_CN.md)
+- [Deploy multi-services on one GPU(Chinese)](doc/PERFORMANCE_OPTIM_CN.md)
 - [CPU Benchmarks(Chinese)](doc/BENCHMARKING.md)
 - [GPU Benchmarks(Chinese)](doc/GPU_BENCHMARKING.md)
 
diff --git a/README_CN.md b/README_CN.md
index ddb06309a..1858b5361 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -283,6 +283,8 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv
 
 ### 关于Paddle Serving性能
 - [如何测试Paddle Serving性能？](python/examples/util/)
+- [如何优化性能?](doc/MULTI_SERVICE_ON_ONE_GPU_CN.md)
+- [在一张GPU上启动多个预测服务](doc/PERFORMANCE_OPTIM_CN.md)
 - [CPU版Benchmarks](doc/BENCHMARKING.md)
 - [GPU版Benchmarks](doc/GPU_BENCHMARKING.md)
 

From d4451d920cf485d5c5b52a3c14292db2b0f8186f Mon Sep 17 00:00:00 2001
From: MRXLT <xlt2024@gmail.com>
Date: Tue, 7 Apr 2020 15:40:38 +0800
Subject: [PATCH 3/9] update doc

---
 README.md    | 6 ++++--
 README_CN.md | 4 +++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 82dabd86e..aa7cff637 100644
--- a/README.md
+++ b/README.md
@@ -37,8 +37,9 @@ We consider deploying deep learning inference service online to be a user-facing
 We highly recommend you to run Paddle Serving in Docker, please visit [Run in Docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md)
 
 ```shell
-pip install paddle-serving-client
-pip install paddle-serving-server
+pip install paddle-serving-client 
+pip install paddle-serving-server # CPU
+pip install paddle-serving-server-gpu # GPU
 ```
 
 <h2 align="center">Quick Start Example</h2>
@@ -128,6 +129,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"words": "我爱北京天
 - **Description**: 
 ``` shell
 Image classification trained with Imagenet dataset. A label and corresponding probability will be returned.
+Note: This demo needs paddle-serving-server-gpu. 
 ```
 
 - **Download Servable Package**: 
diff --git a/README_CN.md b/README_CN.md
index 1858b5361..85a4056a0 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -39,7 +39,8 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务
 
 ```shell
 pip install paddle-serving-client
-pip install paddle-serving-server
+pip install paddle-serving-server # CPU
+pip install paddle-serving-server-gpu # GPU
 ```
 
 <h2 align="center">快速启动示例</h2>
@@ -167,6 +168,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"words": "我爱北京天
 - **介绍**: 
 ``` shell
 图像分类模型由Imagenet数据集训练而成，该服务会返回一个标签及其概率
+注意：本示例需要安装paddle-serving-server-gpu
 ```
 
 - **下载服务包**: 

From 6f758119504c2967cb2579eab6dcec3763887510 Mon Sep 17 00:00:00 2001
From: MRXLT <xlt2024@gmail.com>
Date: Tue, 7 Apr 2020 15:46:35 +0800
Subject: [PATCH 4/9] update doc

---
 doc/PERFORMANCE_OPTIM_CN.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/PERFORMANCE_OPTIM_CN.md b/doc/PERFORMANCE_OPTIM_CN.md
index e16dba780..dd17bc8af 100644
--- a/doc/PERFORMANCE_OPTIM_CN.md
+++ b/doc/PERFORMANCE_OPTIM_CN.md
@@ -8,6 +8,6 @@
 
 对于计算密集型的预测服务，可以使用GPU预测服务代替CPU预测服务，或者增加GPU预测服务的显卡数量。
 
-在相同条件下，Paddle Serving提供的HTTP预测服务的通信时间是大于RPC预测服务的。
+在相同条件下，Paddle Serving提供的HTTP预测服务的通信时间是大于RPC预测服务的，因此对于通信密集型的服务请优先考虑使用RPC的通信方式。
 
 对于模型较大，预测服务内存或显存占用较多的情况，可以通过将--mem_optim选项设置为True来开启内存/显存优化。

From 22e7c475e307e4383c396294ee4aaeaf0482eeba Mon Sep 17 00:00:00 2001
From: MRXLT <xlt2024@gmail.com>
Date: Tue, 7 Apr 2020 05:37:24 +0000
Subject: [PATCH 5/9] fix show_profile for py3

---
 python/examples/util/show_profile.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/examples/util/show_profile.py b/python/examples/util/show_profile.py
index c3e8adc0c..9153d9393 100644
--- a/python/examples/util/show_profile.py
+++ b/python/examples/util/show_profile.py
@@ -10,7 +10,7 @@
 def prase(line):
     profile_list = line.split(" ")
     num = len(profile_list)
-    for idx in range(num / 2):
+    for idx in range(int(num / 2)):
         profile_0_list = profile_list[idx * 2].split(":")
         profile_1_list = profile_list[idx * 2 + 1].split(":")
         if len(profile_0_list[0].split("_")) == 2:
@@ -18,7 +18,7 @@ def prase(line):
         else:
             name = profile_0_list[0].split("_")[0] + "_" + profile_0_list[
                 0].split("_")[1]
-        cost = long(profile_1_list[1]) - long(profile_0_list[1])
+        cost = int(profile_1_list[1]) - int(profile_0_list[1])
         if name not in time_dict:
             time_dict[name] = cost
         else:

From 58e7dda48c7ec0318412b8de82583c2c644f5c21 Mon Sep 17 00:00:00 2001
From: MRXLT <xlt2024@gmail.com>
Date: Tue, 7 Apr 2020 17:18:54 +0800
Subject: [PATCH 6/9] fix conflict

---
 README.md    | 1 +
 README_CN.md | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/README.md b/README.md
index aa7cff637..8575194a3 100644
--- a/README.md
+++ b/README.md
@@ -42,6 +42,7 @@ pip install paddle-serving-server # CPU
 pip install paddle-serving-server-gpu # GPU
 ```
 
+You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add "-i https://pypi.tuna.tsinghua.edu.cn/simple" to pip command) to speed up the download. 
 <h2 align="center">Quick Start Example</h2>
 
 ### Boston House Price Prediction model
diff --git a/README_CN.md b/README_CN.md
index 85a4056a0..8a19c5cbd 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -43,6 +43,8 @@ pip install paddle-serving-server # CPU
 pip install paddle-serving-server-gpu # GPU
 ```
 
+您可能需要使用国内镜像源（例如清华源, 在pip命令中添加"-i https://pypi.tuna.tsinghua.edu.cn/simple"）来加速下载。
+
 <h2 align="center">快速启动示例</h2>
 
 <h3 align="center">波士顿房价预测</h3>

From 2e68c9823331f1c806b99f0a9d6d0e27dd6dab9e Mon Sep 17 00:00:00 2001
From: MRXLT <xlt2024@gmail.com>
Date: Tue, 7 Apr 2020 17:53:32 +0800
Subject: [PATCH 7/9] fix conflict

---
 README.md    | 17 ++++++++++++++++-
 README_CN.md | 14 ++++++++++++++
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 8575194a3..38e6fcd12 100644
--- a/README.md
+++ b/README.md
@@ -35,6 +35,18 @@ We consider deploying deep learning inference service online to be a user-facing
 <h2 align="center">Installation</h2>
 
 We highly recommend you to run Paddle Serving in Docker, please visit [Run in Docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md)
+```
+# Run CPU Docker
+docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0
+docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0
+docker exec -it test bash
+```
+```
+# Run GPU Docker
+nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
+nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
+nvidia-docker exec -it test bash
+```
 
 ```shell
 pip install paddle-serving-client 
@@ -42,7 +54,10 @@ pip install paddle-serving-server # CPU
 pip install paddle-serving-server-gpu # GPU
 ```
 
-You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add "-i https://pypi.tuna.tsinghua.edu.cn/simple" to pip command) to speed up the download. 
+You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add "-i https://pypi.tuna.tsinghua.edu.cn/simple" to pip command) to speed up the download.
+ 
+Client package support Centos 7 and Ubuntu 18, or you can use HTTP service without install client.
+
 <h2 align="center">Quick Start Example</h2>
 
 ### Boston House Price Prediction model
diff --git a/README_CN.md b/README_CN.md
index 8a19c5cbd..d7dd1da7e 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -37,6 +37,18 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务
 
 强烈建议您在Docker内构建Paddle Serving，请查看[如何在Docker中运行PaddleServing](doc/RUN_IN_DOCKER_CN.md)
 
+```
+# 启动 CPU Docker
+docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0
+docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0
+docker exec -it test bash
+```
+```
+# 启动 GPU Docker
+nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
+nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
+nvidia-docker exec -it test bash
+```
 ```shell
 pip install paddle-serving-client
 pip install paddle-serving-server # CPU
@@ -45,6 +57,8 @@ pip install paddle-serving-server-gpu # GPU
 
 您可能需要使用国内镜像源（例如清华源, 在pip命令中添加"-i https://pypi.tuna.tsinghua.edu.cn/simple"）来加速下载。
 
+客户端安装包支持Centos 7和Ubuntu 18，或者您可以使用HTTP服务，这种情况下不需要安装客户端。
+
 <h2 align="center">快速启动示例</h2>
 
 <h3 align="center">波士顿房价预测</h3>

From cb9daeb859f961abbc105e282ad54401dfe0a311 Mon Sep 17 00:00:00 2001
From: MRXLT <xlt2024@gmail.com>
Date: Tue, 7 Apr 2020 17:57:15 +0800
Subject: [PATCH 8/9] update readme

---
 README.md    | 2 +-
 README_CN.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 38e6fcd12..5c8b12ed6 100644
--- a/README.md
+++ b/README.md
@@ -54,7 +54,7 @@ pip install paddle-serving-server # CPU
 pip install paddle-serving-server-gpu # GPU
 ```
 
-You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add "-i https://pypi.tuna.tsinghua.edu.cn/simple" to pip command) to speed up the download.
+You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add `-i https://pypi.tuna.tsinghua.edu.cn/simple` to pip command) to speed up the download.
  
 Client package support Centos 7 and Ubuntu 18, or you can use HTTP service without install client.
 
diff --git a/README_CN.md b/README_CN.md
index d7dd1da7e..ee83758a1 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -55,7 +55,7 @@ pip install paddle-serving-server # CPU
 pip install paddle-serving-server-gpu # GPU
 ```
 
-您可能需要使用国内镜像源（例如清华源, 在pip命令中添加"-i https://pypi.tuna.tsinghua.edu.cn/simple"）来加速下载。
+您可能需要使用国内镜像源（例如清华源, 在pip命令中添加`-i https://pypi.tuna.tsinghua.edu.cn/simple`）来加速下载。
 
 客户端安装包支持Centos 7和Ubuntu 18，或者您可以使用HTTP服务，这种情况下不需要安装客户端。
 

From dbbb01e2a8964448e627b3435d2e9e96f8aea35c Mon Sep 17 00:00:00 2001
From: MRXLT <xlt2024@gmail.com>
Date: Tue, 7 Apr 2020 19:50:15 +0800
Subject: [PATCH 9/9] shape check

---
 doc/MULTI_SERVICE_ON_ONE_GPU_CN.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md b/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md
index fd32fd5b1..5095ad849 100644
--- a/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md
+++ b/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md
@@ -11,4 +11,4 @@ python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393
 
 在卡0上，同时部署了bert示例和iamgenet示例。
 
-**注意：** 单张显卡内部进行推理计算时仍然为串行计算，这种方式是为了减少server端显卡的空闲时间。 
\ No newline at end of file
+**注意：** 单张显卡内部进行推理计算时仍然为串行计算，这种方式是为了减少server端显卡的空闲时间。