Skip to content

Latest commit

 

History

History
executable file
·
89 lines (63 loc) · 3.55 KB

File metadata and controls

executable file
·
89 lines (63 loc) · 3.55 KB

Hadoop single node setup

(how to install Hadoop in pseudo-distributed mode)

There are three supported mode to install Hadoop:

  • Local (Standalone) Mode
  • Pseudo-Distributed Mode
  • Fully-Distributed Mode

We choose to install the pseudo-distributed, because is really close to the real distributed mode.

Downloading Hadoop

Choose a version: http://hadoop.apache.org/releases.html#Download

Windows

We suggest to install any Linux distribution on Oracle VM VirtualBox to work virtually in a safe way.

Linux

The following steps are just few hints, is more a guideline than precise directives.

  1. Install Java Development Kit.

  2. Create a new environment variable called JAVA_HOME in your PATH variable.

  3. Install ssh and create new keys, then enable passwordless connection with localhost, so:

  • ssh-keygen (don't setup passphrases)
  • ssh-copy-id -i .ssh/id_rsa.pub localhost
  1. Download a stable version of hadoop (better if 0.X or 1.X).

  2. Unzip in your file system (create a link. Ex.: ln -s hadoop-X.X.X hadoop)

  3. Add in your PATH variable HADOOP_HOME with the path of the link above.

  4. Uncomment and modify export JAVA_HOME=... in conf/hadoop-env.sh.

  5. Create a folder for temporary files, for instance called tmp. Add these properties to conf/core-site.xml with the right path of the tmp folder (this folder has to contain this other two folders: dfs/name/).

    hadoop.tmp.dir ......./tmp A base for other temporary directories. Ex.: /opt/hadoop/tmp with subfolders dfs/name/ fs.default.name hdfs://localhost:54310 The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.
  6. Add this property to conf/hdfs-site.xml

    dfs.replication 1 Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
  7. Add this property to conf/mapred-site.xml

    mapred.job.tracker localhost:54311 The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.
  8. Format hdfs with this command hadoop namenode -format

  9. Now you can start hdfs and mapreduce (start-dfs.sh and start-mapred.sh)

References