This repository has been archived by the owner on Jan 3, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 68
Add EC performance test scripts #2059
Merged
Merged
Changes from 9 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
982b928
add ec test
f174203
add httpAdapter
15bf34a
refine
c39bd98
refine
ac763a5
add README
9227b4f
improve the waiting logic
32096d5
refine
0f401ba
improve wait logic
9439983
generalize
cdcdc7b
refine
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Performance Test for Data Sync | ||
|
||
## Requirements | ||
- Deploy SSM, please refer to /SSM/doc/ssm-deployment-guide.md. | ||
- Deploy one HDFS cluster and configure its bin in $PATH of OS. | ||
- Install MySQL for SSM storing Metadata. | ||
- Install PAT(https://github.com/intel-hadoop/PAT). | ||
|
||
## Configuration | ||
Configure the file named config. For the test case, the corresponding test data should be created in the HDFS cluster beforehand by executing 'prepare.sh'. | ||
|
||
## SSM ec test | ||
1. Run `./test_ssm_ec_performance.sh` | ||
2. A file named ssm.log under this directory will record the time for each round of test. SSM log and PAT data will be collected in ${PAT_HOME}/PAT-collecting-data/results. | ||
Note: The rule check interval in run_ssm_ec.py was set to a long period, to ensure the rule check was conducted only once during test. So that a large amount of redundant cmdlets can be saved and the execution time becomes more accurate. | ||
## HDFS distcp ec test | ||
1. Yarn should be launched for the test cluster. | ||
2. Run `./test_distcp_ec.sh` | ||
3. A file named distcp.log under this directory will record the time. The distcp logs and PAT data will be collected in ${PAT_HOME}/PAT-collecting-data/results. | ||
|
||
## Other test scripts | ||
The script 'test_ssm_ec_only.sh' is used to test ssm ec for 1 time, without unec operation. | ||
The script 'test_ssm_unec_only.sh' is used to test ssm unec for 1 time, without ec operation. | ||
The script 'test_distcp_replica.sh' is used to copy the files which are converted to ec policy alreadly to a dir whose ec policy is set as 3 replica. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# These are test cases which specify file size and num. The base sync dir name is "size_num". | ||
# The files for these case should be created under them in advance. | ||
|
||
# Test cases, e.g., declare -A CASES=(["10MB"]="10" ["100MB"]="10") | ||
declare -A CASES=([]="") | ||
|
||
# SSM home, e.g., SMART_HOME=~/smart-data-1.4.0 | ||
SMART_HOME= | ||
|
||
# PAT home, e.g., PAT_HOME=~/PAT | ||
PAT_HOME= | ||
|
||
# PAT home, e.g., HADOOP_HOME=~/hadoop | ||
HADOOP_HOME= | ||
|
||
# The cluster hdfs url for ec performance test, e.g., SRC_CLUSTER=hdfs://sr613:9000 | ||
SRC_CLUSTER= | ||
|
||
# The dest dir to store the data converted to ec by distcp | ||
DEST_DIR_EC= | ||
|
||
# The dest dir to store the data converted to replica by distcp | ||
DEST_DIR_REPLICA= | ||
|
||
# The namenode's hostname of src cluster | ||
SRC_NODE= | ||
|
||
# The namenode's hostname for remote HDFS cluster | ||
REMOTE_NAMENODE= | ||
|
||
# The hosts require dropping cache, e.g., HOSTS="host1 host2" | ||
HOSTS= | ||
|
||
# the number of mapper for distcp, e.g., MAPPER_NUM="30 60 90" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a comment: For the sake of fair comparison, please make sure this value is consistent with the overall cmdlet executors of SSM. |
||
MAPPER_NUM= |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
#!/usr/bin/env bash | ||
|
||
. ./config | ||
|
||
# drop cache for hosts specified in config. | ||
drop_cache="sync;echo 3 > /proc/sys/vm/drop_caches" | ||
echo "drop cache for ${HOSTS}." | ||
for host in ${HOSTS}; do | ||
ssh $host "${drop_cache}" | ||
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
#!/usr/bin/env bash | ||
|
||
. ./config | ||
|
||
# generate test data using DFSIO | ||
for size in "${!CASES[@]}"; do | ||
num=${CASES[$size]} | ||
dir="${size}_${num}" | ||
ssh ${REMOTE_NAMENODE} "hdfs dfs -mkdir /${dir}" | ||
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.2.0-SNAPSHOT-tests.jar TestDFSIO -write -nrFiles $(($num)) -size ${size} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can use hadoop-mapreduce-client-jobclient-*-tests.jar instead. |
||
ssh ${REMOTE_NAMENODE} "hdfs dfs -mv /benchmarks/TestDFSIO/io_data/* /"${size}_$num"" | ||
ssh ${REMOTE_NAMENODE} "hdfs dfs -rm -r /benchmarks" | ||
done | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
#!/usr/bin/env bash | ||
|
||
. ./config | ||
|
||
# delete historical data and set ec policy | ||
for size in "${!CASES[@]}"; do | ||
num=${CASES[$size]} | ||
dir="${size}_${num}" | ||
# delete historical data | ||
ssh ${SRC_NODE} "hdfs dfs -rm -r /${DEST_DIR_EC}/${dir}; hdfs dfs -mkdir /${DEST_DIR_EC}/${dir}" | ||
# set ec policy | ||
ssh ${SRC_NODE} "hdfs ec -setPolicy -path /${DEST_DIR_EC}/${dir}" | ||
done | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
#!/usr/bin/env bash | ||
|
||
. ./config | ||
|
||
# delete historical data and mkdir | ||
for size in "${!CASES[@]}"; do | ||
num=${CASES[$size]} | ||
dir="${size}_${num}" | ||
ssh ${SRC_NODE} "hdfs dfs -rm -r /${DEST_DIR_REPLICA}/${dir}; hdfs dfs -mkdir /${DEST_DIR_REPLICA}/${dir}" | ||
done | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
import sys | ||
import time | ||
from util import * | ||
|
||
size = sys.argv[1] | ||
num = sys.argv[2] | ||
case = size + "_" + num | ||
log = sys.argv[3] | ||
action = sys.argv[4] | ||
|
||
if action == "ec": | ||
rid = submit_rule("file: every 500min|path matches \"/" + case + "/*\"|ec -policy RS-6-3-1024k") | ||
elif action == "unec": | ||
rid = submit_rule("file: every 500min|path matches \"/" + case + "/*\" | unec") | ||
|
||
start_rule(rid) | ||
start_time = time.time() | ||
rule = get_rule(rid) | ||
last_checked = rule['numChecked'] | ||
last_cmdsgen = rule['numCmdsGen'] | ||
time.sleep(.1) | ||
cids = get_cids_of_rule(rid) | ||
while len(cids) < int(num): | ||
time.sleep(.1) | ||
rule = get_rule(rid) | ||
cids = get_cids_of_rule(rid) | ||
time.sleep(.1) | ||
cids = get_cids_of_rule(rid) | ||
last_cmdsgen = rule['numCmdsGen'] | ||
if len(cids) != last_cmdsgen: | ||
print("Num Error") | ||
else: | ||
wait_cmdlets(cids) | ||
end_time = time.time() | ||
stop_rule(rid) | ||
# append result to log file | ||
f = open(log, 'a') | ||
f.write(str(int(end_time - start_time)) + "s" + " " + '\n') | ||
f.close() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
#!/usr/bin/env bash | ||
|
||
echo "Get configuration from config." | ||
. config | ||
echo "------------------ Your configuration ------------------" | ||
echo "PAT home is ${PAT_HOME}." | ||
echo "Test case:" | ||
for size in ${!CASES[@]}; do | ||
echo ${size} ${CASES[$size]} | ||
done | ||
echo "--------------------------------------------------------" | ||
|
||
bin=$(dirname "${BASH_SOURCE-$0}") | ||
bin=$(cd "${bin}">/dev/null; pwd) | ||
log="${bin}/distcp.log" | ||
# remove historical data in log file | ||
printf "" > ${log} | ||
for size in "${!CASES[@]}"; do | ||
case=${size}_${CASES[$size]} | ||
printf "Test case ${case} with ${MAPPER_NUM} mappers:\n ec\n" >> ${log} | ||
for i in {1..3}; do | ||
echo "==================== test case: $case, mapper num: ${MAPPER_NUM}, test round: $i ============================" | ||
sh drop_cache.sh | ||
# delete historical data and set ec policy | ||
sh prepare_ec.sh | ||
cd ${PAT_HOME}/PAT-collecting-data | ||
echo "start_time=\`date +%s\`;\ | ||
hadoop distcp -skipcrccheck -m ${MAPPER_NUM} ${SRC_CLUSTER}/${case}/* ${SRC_CLUSTER}/${DEST_DIR}/${case}/ > results/$case_${MAPPER_NUM}_$i.log 2>&1;\ | ||
end_time=\`date +%s\`;\ | ||
printf \"\$((end_time-start_time))s \" >> ${log}" > cmd.sh | ||
./pat run "${case}_"ec"_${MAPPER_NUM}_${i}" | ||
cd ${bin} | ||
done | ||
printf "\nTest case ${case} with $m mapper is finished!\n" >> ${log} | ||
done | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
#!/usr/bin/env bash | ||
|
||
echo "Get configuration from config." | ||
. config | ||
echo "------------------ Your configuration ------------------" | ||
echo "PAT home is ${PAT_HOME}." | ||
echo "Test case:" | ||
for size in ${!CASES[@]}; do | ||
echo ${size} ${CASES[$size]} | ||
done | ||
echo "--------------------------------------------------------" | ||
|
||
bin=$(dirname "${BASH_SOURCE-$0}") | ||
bin=$(cd "${bin}">/dev/null; pwd) | ||
log="${bin}/distcp.log" | ||
# remove historical data in log file | ||
printf "" > ${log} | ||
for size in "${!CASES[@]}"; do | ||
case=${size}_${CASES[$size]} | ||
printf "Test case ${case} with ${MAPPER_NUM} mappers:\n replica\n" >> ${log} | ||
for i in {1..3}; do | ||
echo "==================== test case: $case, mapper num: ${MAPPER_NUM}, test round: $i ============================" | ||
sh drop_cache.sh | ||
# delete historical data and mkdir | ||
sh prepare_replica.sh | ||
cd ${PAT_HOME}/PAT-collecting-data | ||
echo "start_time=\`date +%s\`;\ | ||
hadoop distcp -skipcrccheck -m ${MAPPER_NUM} ${SRC_CLUSTER}/${DEST_DIR_EC}/${case}/* ${SRC_CLUSTER}/${DEST_DIR_REPLICA}/${case}/ > results/$case_${MAPPER_NUM}_$i.log 2>&1;\ | ||
end_time=\`date +%s\`;\ | ||
printf \"\$((end_time-start_time))s \" >> ${log}" > cmd.sh | ||
./pat run "${case}_"replica"_${MAPPER_NUM}_${i}" | ||
cd ${bin} | ||
done | ||
printf "\nTest case ${case} with $m mapper is finished!\n" >> ${log} | ||
done | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#!/usr/bin/env bash | ||
# avoid blocking REST API | ||
unset http_proxy | ||
# for python use | ||
export PYTHONPATH=../integration-test:$PYTHONPATH | ||
|
||
echo "Get configuration from config." | ||
source config | ||
echo "------------------ Your configuration ------------------" | ||
echo "SSM home is ${SMART_HOME}." | ||
echo "PAT home is ${PAT_HOME}." | ||
echo "Test case:" | ||
for size in ${!CASES[@]}; do | ||
echo ${size} ${CASES[$size]} | ||
done | ||
echo "--------------------------------------------------------" | ||
|
||
# Test ec conversion for 1 round | ||
bin=$(dirname "${BASH_SOURCE-$0}") | ||
bin=$(cd "${bin}">/dev/null; pwd) | ||
log="${bin}/ssm.log" | ||
# remove historical data in log file | ||
printf "" > ${log} | ||
for size in "${!CASES[@]}"; do | ||
case="${size}_${CASES[$size]}" | ||
action="ec" | ||
echo "Test case ${case}($action):" >> ${log} | ||
echo "==================== test case: $case, test round: 1 ============================" | ||
sh drop_cache.sh | ||
# make ssm log empty before test | ||
printf "" > ${SMART_HOME}/logs/smartserver.log | ||
cd ${PAT_HOME}/PAT-collecting-data | ||
echo "export PYTHONPATH=${bin}/../integration-test:${PYTHONPATH};\ | ||
python ${bin}/run_ssm_ec.py ${size} ${CASES[$size]} ${log} ${action}" > cmd.sh | ||
./pat run "${case}_${action}" | ||
cp ${SMART_HOME}/logs/smartserver.log ./results/${case}-${action}.log | ||
cd ${bin} | ||
printf "\nTest case ${case} is finished!\n" >> ${log} | ||
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
#!/usr/bin/env bash | ||
# avoid blocking REST API | ||
unset http_proxy | ||
# for python use | ||
export PYTHONPATH=../integration-test:$PYTHONPATH | ||
|
||
echo "Get configuration from config." | ||
source config | ||
echo "------------------ Your configuration ------------------" | ||
echo "SSM home is ${SMART_HOME}." | ||
echo "PAT home is ${PAT_HOME}." | ||
echo "Test case:" | ||
for size in ${!CASES[@]}; do | ||
echo ${size} ${CASES[$size]} | ||
done | ||
echo "--------------------------------------------------------" | ||
|
||
bin=$(dirname "${BASH_SOURCE-$0}") | ||
bin=$(cd "${bin}">/dev/null; pwd) | ||
log="${bin}/ssm.log" | ||
# remove historical data in log file | ||
printf "" > ${log} | ||
|
||
# Test ec and unec continuously for 3 rounds | ||
for size in "${!CASES[@]}"; do | ||
case="${size}_${CASES[$size]}" | ||
for i in {1..3}; do | ||
# ec | ||
action="ec" | ||
echo "Test case ${case}($action):" >> ${log} | ||
echo "==================== test case: $case, test round: $i ============================" | ||
sh drop_cache.sh | ||
# make ssm log empty before test | ||
printf "" > ${SMART_HOME}/logs/smartserver.log | ||
cd ${PAT_HOME}/PAT-collecting-data | ||
echo "export PYTHONPATH=${bin}/../integration-test:${PYTHONPATH};\ | ||
python ${bin}/run_ssm_ec.py ${size} ${CASES[$size]} ${log} ${action}" > cmd.sh | ||
./pat run "${case}_${i}_${action}" | ||
cp ${SMART_HOME}/logs/smartserver.log ./results/${case}-${i}-${action}.log | ||
cd ${bin} | ||
# unec | ||
action="unec" | ||
echo "Test case ${case}($action):" >> ${log} | ||
echo "==================== test case: $case, test round: $i ============================" | ||
sh drop_cache.sh | ||
cd ${PAT_HOME}/PAT-collecting-data | ||
echo "export PYTHONPATH=${bin}/../integration-test:${PYTHONPATH};\ | ||
python ${bin}/run_ssm_ec.py ${size} ${CASES[$size]} ${log} ${action}" > cmd.sh | ||
./pat run "${case}_${i}_${action}" | ||
cp ${SMART_HOME}/logs/smartserver.log ./results/${case}-${i}-${action}.log | ||
cd ${bin} | ||
done | ||
printf "\nTest case ${case} is finished!\n" >> ${log} | ||
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#!/usr/bin/env bash | ||
# avoid blocking REST API | ||
unset http_proxy | ||
# for python use | ||
export PYTHONPATH=../integration-test:$PYTHONPATH | ||
|
||
echo "Get configuration from config." | ||
source config | ||
echo "------------------ Your configuration ------------------" | ||
echo "SSM home is ${SMART_HOME}." | ||
echo "PAT home is ${PAT_HOME}." | ||
echo "Test case:" | ||
for size in ${!CASES[@]}; do | ||
echo ${size} ${CASES[$size]} | ||
done | ||
echo "--------------------------------------------------------" | ||
|
||
# Test unec for 1 round | ||
bin=$(dirname "${BASH_SOURCE-$0}") | ||
bin=$(cd "${bin}">/dev/null; pwd) | ||
log="${bin}/ssm.log" | ||
# remove historical data in log file | ||
printf "" > ${log} | ||
for size in "${!CASES[@]}"; do | ||
case="${size}_${CASES[$size]}" | ||
# make ssm log empty before test | ||
printf "" > ${SMART_HOME}/logs/smartserver.log | ||
action="unec" | ||
echo "Test case ${case}($action):" >> ${log} | ||
echo "==================== test case: $case, test round: 1 ============================" | ||
sh drop_cache.sh | ||
cd ${PAT_HOME}/PAT-collecting-data | ||
echo "export PYTHONPATH=${bin}/../integration-test:${PYTHONPATH};\ | ||
python ${bin}/run_ssm_ec.py ${size} ${CASES[$size]} ${log} ${action}" > cmd.sh | ||
./pat run "${case}_${action}" | ||
cp ${SMART_HOME}/logs/smartserver.log ./results/${case}-${action}.log | ||
cd ${bin} | ||
printf "\nTest case ${case} is finished!\n" >> ${log} | ||
done |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed and use SRC_NODE instead?