- Prerequisite
- Install Java
- Install
if needed
- Download a stable Hadoop release
wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.7/hadoop-2.7.7.tar.gz
tar -xvf hadoop-2.7.7.tar.gz
- Make changes to
export JAVA_HOME=/usr
export HADOOP_HOME=$HOME/hadoop-2.7.7
export HADOOP_CONF_DIR=$HOME/hadoop-2.7.7/etc/hadoop
export HADOOP_MAPRED_HOME=$HOME/hadoop-2.7.7
export HADOOP_COMMON_HOME=$HOME/hadoop-2.7.7
export HADOOP_HDFS_HOME=$HOME/hadoop-2.7.7
export YARN_HOME=$HOME/hadoop-2.7.7
export PATH=$PATH:$HOME/hadoop-2.7.7/bin
export HADOOP_CLASSPATH=/usr/lib/jvm/java-openjdk/lib/tools.jar
Make the changes work by executing
source .bashrc
java -version
andhadoop version
to check whether it works
- Copy the master node's ssh key (create one if there's none) to slave's authorized keys.
ssh-copy-id -i $HOME/.ssh/id_rsa.pub username@slave
Create master files in
and add master's IP to it. -
Add slave IPs in
. -
on both master and slave machines as follows:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- Edit
on the master machine as follows:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- Edit
on slave machines as follows:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- Copy
from the template in configuration folder and the editmapred-site.xml
on both master and slave machines as follows:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- Edit
on both master and slave machines as follows:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- Format the namenode (only on the master machine).
hadoop namenode -format
- Start all daemons (only on the master machine).
[haoranq4@master hadoop-2.7.7]$ ./sbin/start-dfs.sh
[haoranq4@master hadoop-2.7.7]$ ./sbin/start-yarn.sh
- Check all the daemons running on both master and slave machines.
On the master machine, you should see something like this:
20869 SecondaryNameNode
21206 NodeManager
20553 NameNode
22620 Jps
32093 Server
21069 ResourceManager
20703 DataNode
On the slave machine, you should see something like this:
1173 Jps
1500 Server
17134 DataNode
1054 NodeManager
[haoranq4@master hadoop-2.7.7]$ ./sbin/stop-dfs.sh
[haoranq4@master hadoop-2.7.7]$ ./sbin/stop-yarn.sh
hadoop fs -mkdir -p /test/input
hadoop fs -put test-files/input-folder /test/input
hadoop fs -ls /test/input
hadoop jar applications/wc-hadoop.jar wordcount /test/input /test/output
hadoop fs -ls /test/output
hadoop fs -get /test/output/part-r-00000 output-folder/output.txt
hadoop fs -mkdir -p /test/input
hadoop fs -put test-files/input-folder /test/input
hadoop fs -ls /test/input
hadoop jar applications/rwlg-hadoop.jar ReverseWebLink /test/input /test/output
hadoop fs -ls /test/output
hadoop fs -get /test/output/part-r-00000 output-folder/output.txt
bin/hadoop com.sun.tools.javac.Main ReverseWebLink.java -> ReverseWebLink*.class
jar cf rwlg.jar ReverseWebLink*.class -> rwlg.jar
Follow the above instructions to execute your .jar
applications on Hadoop!