From 4183a124a4cc683127c2e613287aad8ec6bde13e Mon Sep 17 00:00:00 2001
From: Yancey1989 <yancey1989@gmail.com>
Date: Wed, 10 Jan 2018 19:47:48 +0800
Subject: [PATCH 1/3] add cluster training bencharmk design

---
 benchmark/cluster/README.md | 54 +++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)
 create mode 100644 benchmark/cluster/README.md

diff --git a/benchmark/cluster/README.md b/benchmark/cluster/README.md
new file mode 100644
index 0000000000000..d2c68b6ada80a
--- /dev/null
+++ b/benchmark/cluster/README.md
@@ -0,0 +1,54 @@
+# Cluster Training Benchmark
+
+## Setup
+
+- Platform
+  - Kubernetes: v1.6.2
+  - Linux Kernel: v3.10.0
+
+- Resource
+  - CPU: 10 Cores per Pod
+  - Memory: 5GB per Pod
+
+- Docker Image
+
+  We use different base Docker Image to run the benchmark on Kubernetes:
+  - PaddlePaddle v2: paddlepaddle/paddle:latest
+  - PaddlePaddle Fluid: paddlepaddle/paddle:latest
+  - TensorFlow: tensorflow/tensorflow:latest
+
+- Model
+  A digits recognize model and MNIST dataset is used in this benchmark.
+
+## Compare the Performance
+
+- Variable
+  - Batch Size of training data.
+  - PServer count of the training job.
+
+- Invariant
+  - The number of trainers.
+  - The resource of trainer/pserver Pod.
+
+- Metrics
+  - We use `batch/sec` to measure the training performance.
+
+### BatchSize
+
+| BatchSize | 64 | 128 | 256 | 512 |
+| -- | -- | -- | -- | -- |
+| PaddlePaddle Fluid | - | - | - | - |
+| PaddlePaddle v2 | - | - | - | - |
+| TensorFlow | - | - | - | - |
+
+### PServer Count
+
+| PServer Count | 10 | 20 | 40 | 80 |
+| -- | -- | -- | -- | -- |
+| PaddlePaddle Fluid | - | - | - | - |
+| PaddlePaddle v2 | - | - | - | - |
+| TensorFlow | - | - | - | - |
+
+## Reproduce the benchmark
+
+TODO

From 97e480aa10118cc46d138c3956a8bda647424588 Mon Sep 17 00:00:00 2001
From: Yancey1989 <yancey1989@gmail.com>
Date: Thu, 11 Jan 2018 17:30:05 +0800
Subject: [PATCH 2/3] update by comment

---
 benchmark/cluster/README.md | 48 +++++++++++++++++++++++++++----------
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/benchmark/cluster/README.md b/benchmark/cluster/README.md
index d2c68b6ada80a..674e04df85b79 100644
--- a/benchmark/cluster/README.md
+++ b/benchmark/cluster/README.md
@@ -13,42 +13,66 @@
 - Docker Image
 
   We use different base Docker Image to run the benchmark on Kubernetes:
-  - PaddlePaddle v2: paddlepaddle/paddle:latest
-  - PaddlePaddle Fluid: paddlepaddle/paddle:latest
-  - TensorFlow: tensorflow/tensorflow:latest
+  - PaddlePaddle v2: paddlepaddle/paddle:[commit-id]
+  - PaddlePaddle Fluid: paddlepaddle/paddle:0.10.0
+  - TensorFlow: tensorflow/tensorflow:1.5.0-rc0
 
 - Model
-  A digits recognize model and MNIST dataset is used in this benchmark.
+  vgg16 is used in this benchmark.
 
-## Compare the Performance
+## Cases
 
 - Variable
   - Batch Size of training data.
   - PServer count of the training job.
+  - The number of trainers.
 
 - Invariant
-  - The number of trainers.
   - The resource of trainer/pserver Pod.
 
-- Metrics
-  - We use `batch/sec` to measure the training performance.
+### Measure the Performance for Different Batch Size
 
-### BatchSize
+- PServer Count: 40
+- Trainer Count: 100
+- Metrics: mini-batch / sec
 
-| BatchSize | 64 | 128 | 256 | 512 |
+| Batch Size | 32 | 64 | 128 | 256 |
 | -- | -- | -- | -- | -- |
 | PaddlePaddle Fluid | - | - | - | - |
 | PaddlePaddle v2 | - | - | - | - |
 | TensorFlow | - | - | - | - |
 
-### PServer Count
+### Measure the Performance for Different PServer Count
+
+- Trainer Count: 100
+- Batch Size: 64
+- Metrics: mini-batch / sec
 
-| PServer Count | 10 | 20 | 40 | 80 |
+| PServer Count | 10 | 20 | 40 | 60 |
 | -- | -- | -- | -- | -- |
 | PaddlePaddle Fluid | - | - | - | - |
 | PaddlePaddle v2 | - | - | - | - |
 | TensorFlow | - | - | - | - |
 
+### Measure Parallel Efficiency By Increasing Trainer Count
+
+- PServer Count: 20
+- Batch Size: 64
+- Metrics:
+
+$S = \div(T1, TN)$
+
+which S is the ratio of T1 over TN, training time of 1 and N trainers.
+The parallel efficiency is:
+
+$E = \div(S, N)$
+
+| Trainer Counter | 1 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
+| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
+| PaddlePaddle Fluid | - | - | - | - | - | - | - | - | - | - | - |
+| PaddlePaddle v2 | - | - | - | - | - | - | - | - | - | - | - | - |
+| TensorFlow | - | - | - | - | - | - | - | - | - | - | - | - | - |
+
 ## Reproduce the benchmark
 
 TODO

From c86e744e9db36cccf89cb09922529e88e6e25fed Mon Sep 17 00:00:00 2001
From: Yancey1989 <yancey1989@gmail.com>
Date: Thu, 11 Jan 2018 18:53:10 +0800
Subject: [PATCH 3/3] update by comment

---
 benchmark/cluster/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/benchmark/cluster/README.md b/benchmark/cluster/README.md
index 674e04df85b79..b619613ea7a5b 100644
--- a/benchmark/cluster/README.md
+++ b/benchmark/cluster/README.md
@@ -13,8 +13,8 @@
 - Docker Image
 
   We use different base Docker Image to run the benchmark on Kubernetes:
-  - PaddlePaddle v2: paddlepaddle/paddle:[commit-id]
-  - PaddlePaddle Fluid: paddlepaddle/paddle:0.10.0
+  - PaddlePaddle v2: paddlepaddle/paddle:0.11.0
+  - PaddlePaddle Fluid: paddlepaddle/paddle:[commit-id]
   - TensorFlow: tensorflow/tensorflow:1.5.0-rc0
 
 - Model