Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Add EC performance test scripts #2059

Merged
merged 10 commits into from
Jun 12, 2019
Merged

Add EC performance test scripts #2059

merged 10 commits into from
Jun 12, 2019

Conversation

rui-mo
Copy link
Collaborator

@rui-mo rui-mo commented Apr 28, 2019

No description provided.

@PHILO-HE
Copy link
Member

PHILO-HE commented May 7, 2019

@PHILO-HE
Copy link
Member

PHILO-HE commented May 16, 2019

Please remove unnecessary log file, remove unnecessary empty lines. The specific test info, such as hostname, should also be removed. Please consider adding some comments to make the code easily readable. Please add README.md to tell how to use these scripts.

@PHILO-HE PHILO-HE changed the title add ec performance test Add EC performance test scripts May 21, 2019
supports/ec-performance-test/README.md Outdated Show resolved Hide resolved
supports/ec-performance-test/README.md Outdated Show resolved Hide resolved
supports/ec-performance-test/README.md Outdated Show resolved Hide resolved
supports/ec-performance-test/README.md Outdated Show resolved Hide resolved
supports/ec-performance-test/config Outdated Show resolved Hide resolved
supports/ec-performance-test/config Outdated Show resolved Hide resolved
supports/ec-performance-test/README.md Outdated Show resolved Hide resolved
supports/ec-performance-test/drop_cache.sh Outdated Show resolved Hide resolved
supports/ec-performance-test/prepare.sh Outdated Show resolved Hide resolved
supports/ec-performance-test/test_distcp_ec.sh Outdated Show resolved Hide resolved
num=${CASES[$size]}
dir="${size}_${num}"
ssh ${REMOTE_NAMENODE} "hdfs dfs -mkdir /${dir}"
hadoop jar /root/rui/hadoop-3.2.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.2.0-SNAPSHOT-tests.jar TestDFSIO -write -nrFiles $(($num)) -size ${size}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this command generalized, e.g.: hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar TestDFSIO ...

@rui-mo rui-mo force-pushed the ec-test branch 3 times, most recently from 8c6ad31 to 9439983 Compare May 29, 2019 03:14
# The namenode's hostname of src cluster
SRC_NODE=

# The namenode's hostname for remote HDFS cluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed and use SRC_NODE instead?

num=${CASES[$size]}
dir="${size}_${num}"
ssh ${REMOTE_NAMENODE} "hdfs dfs -mkdir /${dir}"
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.2.0-SNAPSHOT-tests.jar TestDFSIO -write -nrFiles $(($num)) -size ${size}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use hadoop-mapreduce-client-jobclient-*-tests.jar instead.

# The hosts require dropping cache, e.g., HOSTS="host1 host2"
HOSTS=

# the number of mapper for distcp, e.g., MAPPER_NUM="30 60 90"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment: For the sake of fair comparison, please make sure this value is consistent with the overall cmdlet executors of SSM.

@PHILO-HE
Copy link
Member

PHILO-HE commented Jun 4, 2019

Please add some notes that you think are necessary to test in README.md.
For example:

The reason why we set a large time interval in SSM rule.

For the sake of fair comparison, DistCP and SSM should have same parallelism. i.e. the num of mappers for DistCP should be consistent with the num of overall executors (smart.cmdlet.executors * num_smart_agent) for SSM. The parallelism is 90 in our test. You should specify this value in DistCP cmd. For SSM, given that there are 9 smart agents, you should set smart.cmdlet.executors as 10 in smart-default.xml.

You should drop cache after each round of test to avoid impact from OS cache. This operation can be included in your test scripts.

@PHILO-HE PHILO-HE merged commit 763a6af into Intel-bigdata:trunk Jun 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants