Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Add EC performance test scripts #2059

Merged
merged 10 commits into from
Jun 12, 2019
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions supports/ec-performance-test/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Performance Test for Data Sync

## Requirements
- Deploy SSM, please refer to /SSM/doc/ssm-deployment-guide.md.
- Deploy one HDFS cluster and configure its bin in $PATH of OS.
- Install MySQL for SSM storing Metadata.
- Install PAT(https://github.com/intel-hadoop/PAT).

## Configuration
Configure the file named config. For the test case, the corresponding test data should be created in the HDFS cluster beforehand by executing 'prepare.sh'.

## SSM ec test
1. Run `./test_ssm_ec_performance.sh`
2. A file named ssm.log under this directory will record the time for each round of test. SSM log and PAT data will be collected in ${PAT_HOME}/PAT-collecting-data/results.
Note: The rule check interval in run_ssm_ec.py was set to a long period, to ensure the rule check was conducted only once during test. So that a large amount of redundant cmdlets can be saved and the execution time becomes more accurate.
## HDFS distcp ec test
1. Yarn should be launched for the test cluster.
2. Run `./test_distcp_ec.sh`
3. A file named distcp.log under this directory will record the time. The distcp logs and PAT data will be collected in ${PAT_HOME}/PAT-collecting-data/results.

## Other test scripts
The script 'test_ssm_ec_only.sh' is used to test ssm ec for 1 time, without unec operation.
The script 'test_ssm_unec_only.sh' is used to test ssm unec for 1 time, without ec operation.
The script 'test_distcp_replica.sh' is used to copy the files which are converted to ec policy alreadly to a dir whose ec policy is set as 3 replica.
35 changes: 35 additions & 0 deletions supports/ec-performance-test/config
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# These are test cases which specify file size and num. The base sync dir name is "size_num".
# The files for these case should be created under them in advance.

# Test cases, e.g., declare -A CASES=(["10MB"]="10" ["100MB"]="10")
declare -A CASES=([]="")

# SSM home, e.g., SMART_HOME=~/smart-data-1.4.0
SMART_HOME=

# PAT home, e.g., PAT_HOME=~/PAT
PAT_HOME=

# PAT home, e.g., HADOOP_HOME=~/hadoop
HADOOP_HOME=

# The cluster hdfs url for ec performance test, e.g., SRC_CLUSTER=hdfs://sr613:9000
SRC_CLUSTER=

# The dest dir to store the data converted to ec by distcp
DEST_DIR_EC=

# The dest dir to store the data converted to replica by distcp
DEST_DIR_REPLICA=

# The namenode's hostname of src cluster
SRC_NODE=

# The namenode's hostname for remote HDFS cluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed and use SRC_NODE instead?

REMOTE_NAMENODE=

# The hosts require dropping cache, e.g., HOSTS="host1 host2"
HOSTS=

# the number of mapper for distcp, e.g., MAPPER_NUM="30 60 90"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment: For the sake of fair comparison, please make sure this value is consistent with the overall cmdlet executors of SSM.

MAPPER_NUM=
10 changes: 10 additions & 0 deletions supports/ec-performance-test/drop_cache.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash

. ./config

# drop cache for hosts specified in config.
drop_cache="sync;echo 3 > /proc/sys/vm/drop_caches"
echo "drop cache for ${HOSTS}."
for host in ${HOSTS}; do
ssh $host "${drop_cache}"
done
14 changes: 14 additions & 0 deletions supports/ec-performance-test/prepare.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/usr/bin/env bash

. ./config

# generate test data using DFSIO
for size in "${!CASES[@]}"; do
num=${CASES[$size]}
dir="${size}_${num}"
ssh ${REMOTE_NAMENODE} "hdfs dfs -mkdir /${dir}"
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.2.0-SNAPSHOT-tests.jar TestDFSIO -write -nrFiles $(($num)) -size ${size}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use hadoop-mapreduce-client-jobclient-*-tests.jar instead.

ssh ${REMOTE_NAMENODE} "hdfs dfs -mv /benchmarks/TestDFSIO/io_data/* /"${size}_$num""
ssh ${REMOTE_NAMENODE} "hdfs dfs -rm -r /benchmarks"
done

14 changes: 14 additions & 0 deletions supports/ec-performance-test/prepare_ec.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/usr/bin/env bash

. ./config

# delete historical data and set ec policy
for size in "${!CASES[@]}"; do
num=${CASES[$size]}
dir="${size}_${num}"
# delete historical data
ssh ${SRC_NODE} "hdfs dfs -rm -r /${DEST_DIR_EC}/${dir}; hdfs dfs -mkdir /${DEST_DIR_EC}/${dir}"
# set ec policy
ssh ${SRC_NODE} "hdfs ec -setPolicy -path /${DEST_DIR_EC}/${dir}"
done

11 changes: 11 additions & 0 deletions supports/ec-performance-test/prepare_replica.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/usr/bin/env bash

. ./config

# delete historical data and mkdir
for size in "${!CASES[@]}"; do
num=${CASES[$size]}
dir="${size}_${num}"
ssh ${SRC_NODE} "hdfs dfs -rm -r /${DEST_DIR_REPLICA}/${dir}; hdfs dfs -mkdir /${DEST_DIR_REPLICA}/${dir}"
done

39 changes: 39 additions & 0 deletions supports/ec-performance-test/run_ssm_ec.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
import sys
import time
from util import *

size = sys.argv[1]
num = sys.argv[2]
case = size + "_" + num
log = sys.argv[3]
action = sys.argv[4]

if action == "ec":
rid = submit_rule("file: every 500min|path matches \"/" + case + "/*\"|ec -policy RS-6-3-1024k")
elif action == "unec":
rid = submit_rule("file: every 500min|path matches \"/" + case + "/*\" | unec")

start_rule(rid)
start_time = time.time()
rule = get_rule(rid)
last_checked = rule['numChecked']
last_cmdsgen = rule['numCmdsGen']
time.sleep(.1)
cids = get_cids_of_rule(rid)
while len(cids) < int(num):
time.sleep(.1)
rule = get_rule(rid)
cids = get_cids_of_rule(rid)
time.sleep(.1)
cids = get_cids_of_rule(rid)
last_cmdsgen = rule['numCmdsGen']
if len(cids) != last_cmdsgen:
print("Num Error")
else:
wait_cmdlets(cids)
end_time = time.time()
stop_rule(rid)
# append result to log file
f = open(log, 'a')
f.write(str(int(end_time - start_time)) + "s" + " " + '\n')
f.close()
36 changes: 36 additions & 0 deletions supports/ec-performance-test/test_distcp_ec.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/usr/bin/env bash

echo "Get configuration from config."
. config
echo "------------------ Your configuration ------------------"
echo "PAT home is ${PAT_HOME}."
echo "Test case:"
for size in ${!CASES[@]}; do
echo ${size} ${CASES[$size]}
done
echo "--------------------------------------------------------"

bin=$(dirname "${BASH_SOURCE-$0}")
bin=$(cd "${bin}">/dev/null; pwd)
log="${bin}/distcp.log"
# remove historical data in log file
printf "" > ${log}
for size in "${!CASES[@]}"; do
case=${size}_${CASES[$size]}
printf "Test case ${case} with ${MAPPER_NUM} mappers:\n ec\n" >> ${log}
for i in {1..3}; do
echo "==================== test case: $case, mapper num: ${MAPPER_NUM}, test round: $i ============================"
sh drop_cache.sh
# delete historical data and set ec policy
sh prepare_ec.sh
cd ${PAT_HOME}/PAT-collecting-data
echo "start_time=\`date +%s\`;\
hadoop distcp -skipcrccheck -m ${MAPPER_NUM} ${SRC_CLUSTER}/${case}/* ${SRC_CLUSTER}/${DEST_DIR}/${case}/ > results/$case_${MAPPER_NUM}_$i.log 2>&1;\
end_time=\`date +%s\`;\
printf \"\$((end_time-start_time))s \" >> ${log}" > cmd.sh
./pat run "${case}_"ec"_${MAPPER_NUM}_${i}"
cd ${bin}
done
printf "\nTest case ${case} with $m mapper is finished!\n" >> ${log}
done

36 changes: 36 additions & 0 deletions supports/ec-performance-test/test_distcp_replica.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/usr/bin/env bash

echo "Get configuration from config."
. config
echo "------------------ Your configuration ------------------"
echo "PAT home is ${PAT_HOME}."
echo "Test case:"
for size in ${!CASES[@]}; do
echo ${size} ${CASES[$size]}
done
echo "--------------------------------------------------------"

bin=$(dirname "${BASH_SOURCE-$0}")
bin=$(cd "${bin}">/dev/null; pwd)
log="${bin}/distcp.log"
# remove historical data in log file
printf "" > ${log}
for size in "${!CASES[@]}"; do
case=${size}_${CASES[$size]}
printf "Test case ${case} with ${MAPPER_NUM} mappers:\n replica\n" >> ${log}
for i in {1..3}; do
echo "==================== test case: $case, mapper num: ${MAPPER_NUM}, test round: $i ============================"
sh drop_cache.sh
# delete historical data and mkdir
sh prepare_replica.sh
cd ${PAT_HOME}/PAT-collecting-data
echo "start_time=\`date +%s\`;\
hadoop distcp -skipcrccheck -m ${MAPPER_NUM} ${SRC_CLUSTER}/${DEST_DIR_EC}/${case}/* ${SRC_CLUSTER}/${DEST_DIR_REPLICA}/${case}/ > results/$case_${MAPPER_NUM}_$i.log 2>&1;\
end_time=\`date +%s\`;\
printf \"\$((end_time-start_time))s \" >> ${log}" > cmd.sh
./pat run "${case}_"replica"_${MAPPER_NUM}_${i}"
cd ${bin}
done
printf "\nTest case ${case} with $m mapper is finished!\n" >> ${log}
done

39 changes: 39 additions & 0 deletions supports/ec-performance-test/test_ssm_ec_only.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/usr/bin/env bash
# avoid blocking REST API
unset http_proxy
# for python use
export PYTHONPATH=../integration-test:$PYTHONPATH

echo "Get configuration from config."
source config
echo "------------------ Your configuration ------------------"
echo "SSM home is ${SMART_HOME}."
echo "PAT home is ${PAT_HOME}."
echo "Test case:"
for size in ${!CASES[@]}; do
echo ${size} ${CASES[$size]}
done
echo "--------------------------------------------------------"

# Test ec conversion for 1 round
bin=$(dirname "${BASH_SOURCE-$0}")
bin=$(cd "${bin}">/dev/null; pwd)
log="${bin}/ssm.log"
# remove historical data in log file
printf "" > ${log}
for size in "${!CASES[@]}"; do
case="${size}_${CASES[$size]}"
action="ec"
echo "Test case ${case}($action):" >> ${log}
echo "==================== test case: $case, test round: 1 ============================"
sh drop_cache.sh
# make ssm log empty before test
printf "" > ${SMART_HOME}/logs/smartserver.log
cd ${PAT_HOME}/PAT-collecting-data
echo "export PYTHONPATH=${bin}/../integration-test:${PYTHONPATH};\
python ${bin}/run_ssm_ec.py ${size} ${CASES[$size]} ${log} ${action}" > cmd.sh
./pat run "${case}_${action}"
cp ${SMART_HOME}/logs/smartserver.log ./results/${case}-${action}.log
cd ${bin}
printf "\nTest case ${case} is finished!\n" >> ${log}
done
54 changes: 54 additions & 0 deletions supports/ec-performance-test/test_ssm_ec_performance.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#!/usr/bin/env bash
# avoid blocking REST API
unset http_proxy
# for python use
export PYTHONPATH=../integration-test:$PYTHONPATH

echo "Get configuration from config."
source config
echo "------------------ Your configuration ------------------"
echo "SSM home is ${SMART_HOME}."
echo "PAT home is ${PAT_HOME}."
echo "Test case:"
for size in ${!CASES[@]}; do
echo ${size} ${CASES[$size]}
done
echo "--------------------------------------------------------"

bin=$(dirname "${BASH_SOURCE-$0}")
bin=$(cd "${bin}">/dev/null; pwd)
log="${bin}/ssm.log"
# remove historical data in log file
printf "" > ${log}

# Test ec and unec continuously for 3 rounds
for size in "${!CASES[@]}"; do
case="${size}_${CASES[$size]}"
for i in {1..3}; do
# ec
action="ec"
echo "Test case ${case}($action):" >> ${log}
echo "==================== test case: $case, test round: $i ============================"
sh drop_cache.sh
# make ssm log empty before test
printf "" > ${SMART_HOME}/logs/smartserver.log
cd ${PAT_HOME}/PAT-collecting-data
echo "export PYTHONPATH=${bin}/../integration-test:${PYTHONPATH};\
python ${bin}/run_ssm_ec.py ${size} ${CASES[$size]} ${log} ${action}" > cmd.sh
./pat run "${case}_${i}_${action}"
cp ${SMART_HOME}/logs/smartserver.log ./results/${case}-${i}-${action}.log
cd ${bin}
# unec
action="unec"
echo "Test case ${case}($action):" >> ${log}
echo "==================== test case: $case, test round: $i ============================"
sh drop_cache.sh
cd ${PAT_HOME}/PAT-collecting-data
echo "export PYTHONPATH=${bin}/../integration-test:${PYTHONPATH};\
python ${bin}/run_ssm_ec.py ${size} ${CASES[$size]} ${log} ${action}" > cmd.sh
./pat run "${case}_${i}_${action}"
cp ${SMART_HOME}/logs/smartserver.log ./results/${case}-${i}-${action}.log
cd ${bin}
done
printf "\nTest case ${case} is finished!\n" >> ${log}
done
39 changes: 39 additions & 0 deletions supports/ec-performance-test/test_ssm_unec_only.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/usr/bin/env bash
# avoid blocking REST API
unset http_proxy
# for python use
export PYTHONPATH=../integration-test:$PYTHONPATH

echo "Get configuration from config."
source config
echo "------------------ Your configuration ------------------"
echo "SSM home is ${SMART_HOME}."
echo "PAT home is ${PAT_HOME}."
echo "Test case:"
for size in ${!CASES[@]}; do
echo ${size} ${CASES[$size]}
done
echo "--------------------------------------------------------"

# Test unec for 1 round
bin=$(dirname "${BASH_SOURCE-$0}")
bin=$(cd "${bin}">/dev/null; pwd)
log="${bin}/ssm.log"
# remove historical data in log file
printf "" > ${log}
for size in "${!CASES[@]}"; do
case="${size}_${CASES[$size]}"
# make ssm log empty before test
printf "" > ${SMART_HOME}/logs/smartserver.log
action="unec"
echo "Test case ${case}($action):" >> ${log}
echo "==================== test case: $case, test round: 1 ============================"
sh drop_cache.sh
cd ${PAT_HOME}/PAT-collecting-data
echo "export PYTHONPATH=${bin}/../integration-test:${PYTHONPATH};\
python ${bin}/run_ssm_ec.py ${size} ${CASES[$size]} ${log} ${action}" > cmd.sh
./pat run "${case}_${action}"
cp ${SMART_HOME}/logs/smartserver.log ./results/${case}-${action}.log
cd ${bin}
printf "\nTest case ${case} is finished!\n" >> ${log}
done
Loading