From 1a5a36e26bf83cc00b864e47699f408cb5f2c186 Mon Sep 17 00:00:00 2001
From: qoo332001 <sean0651101@gmail.com>
Date: Wed, 24 May 2023 03:42:43 +0800
Subject: [PATCH 1/3] add experiment partitionMigrateTime

---
 docs/balancer/README.md                       |   6 +
 .../experiment_partitionMigrateTime.md        | 180 ++++++++++++++++++
 2 files changed, 186 insertions(+)
 create mode 100644 docs/balancer/experiment_partitionMigrateTime.md

diff --git a/docs/balancer/README.md b/docs/balancer/README.md
index ee168bd59b..4435d47c4b 100644
--- a/docs/balancer/README.md
+++ b/docs/balancer/README.md
@@ -13,3 +13,9 @@ Astraea Balancer 是一個 Kafka 節點端的負載優化框架，其透過使
 * Astraea Balancer 實驗報告
   * [實驗報告#1](experiment_1.md)
   * [實驗報告#2](experiment_2.md)
+
+## 成本估計
+
+* 成本估計實驗報告
+  * [磁碟空間限制實驗](experiment_brokerDiskSpace.md) : kafka partition的搬移過程中會產生一些成本，在搬移前先估計出搬移partition過程中可能佔用的broker/硬碟空間並對其做限制，確保搬移不會佔用過多的儲存空間
+  * [搬移時間限制實驗](experiment_partitionMigrateTime.md) : kafka partition的搬移過程中會產生一些成本，在搬移前先估計出搬移partition過程中可能花費多少搬移時間，並對其做限制確保搬移不會花費過多的時間
diff --git a/docs/balancer/experiment_partitionMigrateTime.md b/docs/balancer/experiment_partitionMigrateTime.md
new file mode 100644
index 0000000000..15370e40ea
--- /dev/null
+++ b/docs/balancer/experiment_partitionMigrateTime.md
@@ -0,0 +1,180 @@
+# 搬移時間限制實驗
+
+這個測試展示目前的搬移成本估計以及限制 [#1665](https://github.com/skiptests/astraea/pull/1665) 
+能在進行負載平衡執行之前，計算其可能會花費的搬移時間，以及對其做限制搬移時間
+
+## 測試情境
+
+* 我們透過專案內的 [WebAPI](https://github.com/skiptests/astraea/blob/7596f590ae0f0ec370a6e257c10cc2aeb5fb5bf4/docs/web_server/web_api_topics_chinese.md#%E5%BB%BA%E7%AB%8B-topic) 工具來對測試叢集產生一個負載不平衡的情境
+
+* 本實驗報告會在搬移的過程中對搬移時間做限制，並且計算限制的搬移時間與實際執行時間的誤差
+
+
+
+## 叢集硬體環境
+
+下圖為網路示意圖：
+
+```
+                                     [500 Mbits Router]
+                                    ┌──────────────────┐
+               [10 Gbits Switch]    │                  │
+   ┌─────┬─────┬─────┬─────┬─────┬──┴──┬──┬──┬──┬──┐   │
+   B1   B2    B3    B4    B5    B6   P1 P2 P3 P4 P5 Balancer
+```
+
+每個機器負責執行的軟體：
+
+| server/client   | broker1                                            | broker2~6                   | producer1~5                     | Balancer              |
+| --------------- | -------------------------------------------------- | --------------------------- | ------------------------------- | --------------------- |
+| 執行的工具/軟體 | Kafka Broker, Zookeeper, Prometheus, Node Exporter | Kafka Broker, Node Exporter | Performance Tool, Node Exporter | 執行 Astraea Balancer |
+
+下表為 B0, B1, B2, B3, B4, B5 的硬體規格：
+
+| 硬體項目 | 型號                                                         |
+| -------- | ------------------------------------------------------------ |
+| CPU      | Intel i9-12900K CPU 3.2G(5.2G)/30M/UHD770/125W               |
+| 主機板   | 華碩 ROG STRIX Z690-G GAMING WIFI(M-ATX/1H1P/Intel 2.5G+Wi-Fi 6E)14+1相數位供電 |
+| 記憶體   | 美光Micron Crucial 32GB DDR5 4800                            |
+| 硬碟     | 威剛XPG SX8200Pro 1TB/M.2 2280/讀:3500M/寫:3000M/TLC/SMI控 * 3 |
+| 網路卡   | XG-C100C [10Gigabit埠] RJ45單埠高速網路卡/PCIe介面           |
+
+下表為執行 Astraea Balancer 的設備之硬體規格：
+
+| 硬體項目 | 型號                                                 |
+| -------- | ---------------------------------------------------- |
+| CPU      | 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz       |
+| 記憶體   | KLEVV DIMM DDR4 Synchronous 2667 MHz (0.4 ns) 16GB*2 |
+| 主機板   | MAG B560 TOMAHAWK WIFI (MS-7D15)                     |
+
+## 叢集軟體環境
+
+這個實驗中包含：
+
+* 6 個 Apache Kafka Broker 節點（version 3.4.0）。
+  * 各個節點包含 3 個 log dir，每個有 844GB 空間的 SSD
+* 1 個 kraft controller 節點（version 3.4.0）。
+* 5 個 Performance Tool 施打資料
+
+以下為建構環境的步驟：
+
+### 建立 Kafka 叢集
+
+請依照上述的環境建立叢集，您可以使用專案內的 
+[./docker/start_contoller.sh](https://github.com/skiptests/astraea/blob/main/docs/run_kafka_broker.md#broker-with-kraft) 來建立叢集
+
+## 效能資料攝取
+
+整個實驗的效能指標數據源自每個 Kafka Broker 的 JMX 資訊，這些資訊透過 jmx_exporter 輸出成 Prometheus 能夠接受的格式，
+接著以 Grafana 繪圖觀察。實驗過程中我們也有關心實際硬體資源的使用情況，這部分我們透過在每個硬體設備啟動的 node exporter 和 Prometheus，
+進行底層硬體效能資料的攝取。
+
+您可以使用專案內的 
+[./docker/start_node_exporter.sh](https://github.com/skiptests/astraea/blob/7596f590ae0f0ec370a6e257c10cc2aeb5fb5bf4/docs/run_node_exporter.md),
+[./docker/start_prometheus.sh](https://github.com/skiptests/astraea/blob/7596f590ae0f0ec370a6e257c10cc2aeb5fb5bf4/docs/run_prometheus.md) 和
+[./docker/start_grafana.sh](https://github.com/skiptests/astraea/blob/7596f590ae0f0ec370a6e257c10cc2aeb5fb5bf4/docs/run_grafana.md) 來建構監控環境。
+
+本次實驗所使用的 Dashboard 可以在[這裡](resources/experiment_1_grafana-1663659783116.json)找到
+
+## 執行實驗
+
+1. 首先取得 Astraea Project
+
+```script
+git clone https://github.com/skiptests/astraea.git
+cd astraea
+```
+
+2. 接著執行 Astraea Web Service，Astraea Web Service 提供一系列的功能，能幫助我們對 Kafka 進行管理和操作。
+
+3. 執行 `./gradlew run --args="web --bootstrap.servers <broker-addresses>"` 來使用 web service，其中 `<broker-addresses>` 是
+   Kafka 對外服務的網路位置。
+
+4. 完成後執行 
+
+```shell
+curl -X POST http://localhost:8001/topics \
+  -H "Content-Type: application/json" \
+  -d '{ "topics": [ { "name":"imbalance-topic", "partitions": 250, "replicas": 2, "probability": 0.2 } ] }'
+```
+
+對 web service 請求建立一個負載不平衡的 topic，其名為 `imbalance-topic`，在這個情境中我們設定其有250個leader，replica備份數量為2，總共500 個 partitions。 
+
+
+
+5. 接着要開始對叢集輸入資料，我們在 P1~P5 設備上執行下面的指令以啓動 [Astraea Performance Tool](https://github.com/skiptests/astraea/blob/7596f590ae0f0ec370a6e257c10cc2aeb5fb5bf4/docs/performance_benchmark.md)
+
+```shell
+./start_app.sh performance --bootstrap.servers 192.168.103.177:25655 --topics imbalance-topic --run.until 5m --producers 10 --consumers 0 --value.size 10KiB --configs acks=0
+```
+
+
+
+### 未套用成本限制
+
+1. 等待producer打完資料後，執行下面指令來針對進行負載平衡
+
+```shell
+curl -X POST http://localhost:8001/topics \
+  -H "Content-Type: application/json" \
+  -d '{
+  	"timeout": "30s",
+  	"balancer": "org.astraea.common.balancer.algorithms.GreedyBalancer",
+  	"balancerConfig": {
+  	  "shuffle.tweaker.min.step": "1",
+  	  "shuffle.tweaker.max.step": "10"
+ 	 },
+  	"clusterCosts": [
+        {
+        	"cost": "org.astraea.common.cost.ReplicaLeaderCost",
+        	"weight": 1
+        }
+    ]
+}'
+```
+
+
+
+測試了幾次相同情境且不限制搬移時間的搬移:
+
+| 次數             | 1    | 2    | 3    |
+| ---------------- | ---- | ---- | ---- |
+| 實際搬移時間(秒) | 443  | 411  | 437  |
+
+
+
+### 針對搬移時間做限制
+
+1. 等待producer打完資料後，進行下面指令，這次對搬移時間來做限制在300秒，並確認實際搬移時間與限制的搬移時間誤差多少
+
+```shell
+curl -X POST http://localhost:8001/topics \
+  -H "Content-Type: application/json" \
+  -d '{
+  	"timeout": "30s",
+  	"balancer": "org.astraea.common.balancer.algorithms.GreedyBalancer",
+  	"balancerConfig": {
+  	  "shuffle.tweaker.min.step": "1",
+  	  "shuffle.tweaker.max.step": "10"
+ 	 },
+  	"clusterCosts": [
+        {
+        	"cost": "org.astraea.common.cost.ReplicaLeaderCost",
+        	"weight": 1
+        }
+    ],
+    "costConfig": {
+    	"max.migrated.time.limit": "300s"
+    }
+}'
+```
+
+
+
+| 次數               | 1           | 2            | 3             | 4             | 5             |
+| ------------------ | ----------- | ------------ | ------------- | ------------- | ------------- |
+| 限制的搬移時間(秒) | 300         | 300          | 300           | 300           | 300           |
+| 實際搬移時間(秒)   | 345         | 345          | 315           | 285           | 330           |
+| 誤差               | 0.130434783 | 0.1304347826 | 0.04761904762 | 0.05263157895 | 0.09090909091 |
+
+### 

From 4ef6b93db735dda8ceb41af182e3a15453fd8cf2 Mon Sep 17 00:00:00 2001
From: qoo332001 <sean0651101@gmail.com>
Date: Fri, 26 May 2023 00:07:06 +0800
Subject: [PATCH 2/3] update docs

---
 .../experiment_partitionMigrateTime.md        | 83 ++++++++++---------
 1 file changed, 44 insertions(+), 39 deletions(-)

diff --git a/docs/balancer/experiment_partitionMigrateTime.md b/docs/balancer/experiment_partitionMigrateTime.md
index 15370e40ea..091fb1b89c 100644
--- a/docs/balancer/experiment_partitionMigrateTime.md
+++ b/docs/balancer/experiment_partitionMigrateTime.md
@@ -105,7 +105,7 @@ curl -X POST http://localhost:8001/topics \
 5. 接着要開始對叢集輸入資料，我們在 P1~P5 設備上執行下面的指令以啓動 [Astraea Performance Tool](https://github.com/skiptests/astraea/blob/7596f590ae0f0ec370a6e257c10cc2aeb5fb5bf4/docs/performance_benchmark.md)
 
 ```shell
-./start_app.sh performance --bootstrap.servers 192.168.103.177:25655 --topics imbalance-topic --run.until 5m --producers 10 --consumers 0 --value.size 10KiB --configs acks=0
+./start_app.sh performance --bootstrap.servers 192.168.103.177:25655 --topics imbalance-topic --run.until 15m --producers 10 --consumers 0 --value.size 10KiB --configs acks=0
 ```
 
 
@@ -115,22 +115,25 @@ curl -X POST http://localhost:8001/topics \
 1. 等待producer打完資料後，執行下面指令來針對進行負載平衡
 
 ```shell
-curl -X POST http://localhost:8001/topics \
+curl -X POST http://localhost:8001/balancer \
   -H "Content-Type: application/json" \
   -d '{
-  	"timeout": "30s",
-  	"balancer": "org.astraea.common.balancer.algorithms.GreedyBalancer",
-  	"balancerConfig": {
-  	  "shuffle.tweaker.min.step": "1",
-  	  "shuffle.tweaker.max.step": "10"
- 	 },
-  	"clusterCosts": [
-        {
-        	"cost": "org.astraea.common.cost.ReplicaLeaderCost",
-        	"weight": 1
-        }
+      "timeout": "60s",
+      "balancer": "org.astraea.common.balancer.algorithms.GreedyBalancer",
+      "balancerConfig": {
+      "shuffle.tweaker.min.step": "1",
+      "shuffle.tweaker.max.step": "10"
+    },
+    "clusterCosts": [
+    {
+      "cost": "org.astraea.common.cost.ReplicaLeaderCost",
+      "weight": 1
+    }
+    ],
+      "moveCosts": [
+        "org.astraea.common.cost.PartitionMigrateTimeCost"
     ]
-}'
+  }'
 ```
 
 
@@ -139,42 +142,44 @@ curl -X POST http://localhost:8001/topics \
 
 | 次數             | 1    | 2    | 3    |
 | ---------------- | ---- | ---- | ---- |
-| 實際搬移時間(秒) | 443  | 411  | 437  |
+| 實際搬移時間(秒) | 570  | 494  | 523  |
 
 
 
 ### 針對搬移時間做限制
 
-1. 等待producer打完資料後，進行下面指令，這次對搬移時間來做限制在300秒，並確認實際搬移時間與限制的搬移時間誤差多少
+1. 等待producer打完資料後，進行下面指令，這次對搬移時間來做限制在400秒，並確認實際搬移時間與限制的搬移時間誤差多少
 
 ```shell
-curl -X POST http://localhost:8001/topics \
+curl -X POST http://localhost:8001/balancer \
   -H "Content-Type: application/json" \
   -d '{
-  	"timeout": "30s",
-  	"balancer": "org.astraea.common.balancer.algorithms.GreedyBalancer",
-  	"balancerConfig": {
-  	  "shuffle.tweaker.min.step": "1",
-  	  "shuffle.tweaker.max.step": "10"
- 	 },
-  	"clusterCosts": [
-        {
-        	"cost": "org.astraea.common.cost.ReplicaLeaderCost",
-        	"weight": 1
-        }
-    ],
-    "costConfig": {
-    	"max.migrated.time.limit": "300s"
-    }
-}'
+       "timeout":"30s",
+       "balancer":"org.astraea.common.balancer.algorithms.GreedyBalancer",
+       "balancerConfig":{
+          "shuffle.tweaker.min.step":"1",
+          "shuffle.tweaker.max.step":"10"
+       },
+       "moveCosts":[
+          "org.astraea.common.cost.BrokerDiskSpaceCost"
+       ],
+       "clusterCosts":[
+          {
+             "cost":"org.astraea.common.cost.ReplicaLeaderCost",
+             "weight":1
+          }
+       ],
+       "costConfig": {
+         "max.migrated.time.limit": "400s"
+       }
+    }'
 ```
 
 
 
-| 次數               | 1           | 2            | 3             | 4             | 5             |
-| ------------------ | ----------- | ------------ | ------------- | ------------- | ------------- |
-| 限制的搬移時間(秒) | 300         | 300          | 300           | 300           | 300           |
-| 實際搬移時間(秒)   | 345         | 345          | 315           | 285           | 330           |
-| 誤差               | 0.130434783 | 0.1304347826 | 0.04761904762 | 0.05263157895 | 0.09090909091 |
+| 次數               | 1             | 2             | 3             | 4             | 5              |
+| ------------------ | ------------- | ------------- | ------------- | ------------- | -------------- |
+| 預設的搬移時間(秒) | 400           | 399           | 399           | 398           | 338            |
+| 實際搬移時間(秒)   | 406           | 389           | 404           | 387           | 341            |
+| 誤差               | 0.01477832512 | 0.02570694087 | 0.01237623762 | 0.02842377261 | 0.008797653959 |
 
-### 

From 0ea83e2291cc37c47a021fa3b5556f03b301683e Mon Sep 17 00:00:00 2001
From: qoo332001 <sean0651101@gmail.com>
Date: Sun, 28 May 2023 00:12:35 +0800
Subject: [PATCH 3/3] update docs

---
 docs/balancer/README.md                          |  4 +---
 docs/balancer/experiment_partitionMigrateTime.md | 14 ++++----------
 2 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/docs/balancer/README.md b/docs/balancer/README.md
index 4435d47c4b..fb683c3594 100644
--- a/docs/balancer/README.md
+++ b/docs/balancer/README.md
@@ -16,6 +16,4 @@ Astraea Balancer 是一個 Kafka 節點端的負載優化框架，其透過使
 
 ## 成本估計
 
-* 成本估計實驗報告
-  * [磁碟空間限制實驗](experiment_brokerDiskSpace.md) : kafka partition的搬移過程中會產生一些成本，在搬移前先估計出搬移partition過程中可能佔用的broker/硬碟空間並對其做限制，確保搬移不會佔用過多的儲存空間
-  * [搬移時間限制實驗](experiment_partitionMigrateTime.md) : kafka partition的搬移過程中會產生一些成本，在搬移前先估計出搬移partition過程中可能花費多少搬移時間，並對其做限制確保搬移不會花費過多的時間
+* [搬移時間限制實驗](experiment_partitionMigrateTime.md) : kafka partition的搬移過程中會產生一些成本，在搬移前先估計出搬移partition過程中可能花費多少搬移時間，並對其做限制確保搬移不會花費超出限制的時間
diff --git a/docs/balancer/experiment_partitionMigrateTime.md b/docs/balancer/experiment_partitionMigrateTime.md
index 091fb1b89c..d4b0b11e2c 100644
--- a/docs/balancer/experiment_partitionMigrateTime.md
+++ b/docs/balancer/experiment_partitionMigrateTime.md
@@ -60,19 +60,13 @@
 
 ### 建立 Kafka 叢集
 
-請依照上述的環境建立叢集，您可以使用專案內的 
-[./docker/start_contoller.sh](https://github.com/skiptests/astraea/blob/main/docs/run_kafka_broker.md#broker-with-kraft) 來建立叢集
+請依照上述的環境建立叢集，您可以使用專案內的 [./docker/start_contoller.sh](https://github.com/skiptests/astraea/blob/main/docs/run_kafka_broker.md#broker-with-kraft) 來建立叢集
 
 ## 效能資料攝取
 
-整個實驗的效能指標數據源自每個 Kafka Broker 的 JMX 資訊，這些資訊透過 jmx_exporter 輸出成 Prometheus 能夠接受的格式，
-接著以 Grafana 繪圖觀察。實驗過程中我們也有關心實際硬體資源的使用情況，這部分我們透過在每個硬體設備啟動的 node exporter 和 Prometheus，
-進行底層硬體效能資料的攝取。
+整個實驗的效能指標透過在每個硬體設備啟動的 node exporter 和 Prometheus進行底層硬體效能資料的攝取。
 
-您可以使用專案內的 
-[./docker/start_node_exporter.sh](https://github.com/skiptests/astraea/blob/7596f590ae0f0ec370a6e257c10cc2aeb5fb5bf4/docs/run_node_exporter.md),
-[./docker/start_prometheus.sh](https://github.com/skiptests/astraea/blob/7596f590ae0f0ec370a6e257c10cc2aeb5fb5bf4/docs/run_prometheus.md) 和
-[./docker/start_grafana.sh](https://github.com/skiptests/astraea/blob/7596f590ae0f0ec370a6e257c10cc2aeb5fb5bf4/docs/run_grafana.md) 來建構監控環境。
+詳細可以看 [./docker/start_node_exporter.sh](https://github.com/skiptests/astraea/blob/7596f590ae0f0ec370a6e257c10cc2aeb5fb5bf4/docs/run_node_exporter.md), [./docker/start_prometheus.sh](https://github.com/skiptests/astraea/blob/7596f590ae0f0ec370a6e257c10cc2aeb5fb5bf4/docs/run_prometheus.md) 和[./docker/start_grafana.sh](https://github.com/skiptests/astraea/blob/7596f590ae0f0ec370a6e257c10cc2aeb5fb5bf4/docs/run_grafana.md) 
 
 本次實驗所使用的 Dashboard 可以在[這裡](resources/experiment_1_grafana-1663659783116.json)找到
 
@@ -131,7 +125,7 @@ curl -X POST http://localhost:8001/balancer \
     }
     ],
       "moveCosts": [
-        "org.astraea.common.cost.PartitionMigrateTimeCost"
+        "org.astraea.common.cost.MigrateTimeCost"
     ]
   }'
 ```