Hadoop Tutorials.CO.IN
Big Data - Hadoop - Hadoop Ecosystem - NoSQL - Spark

Show now !!

Install Hadoop on Ubuntu

by Tanmay Deshpande

1. Copy Java and Hadoop installable on any folder where the user has sufficient information. Here in my case I am copying it under /usr/local/

Download link for Java
Download Link for Hadoop

2. Untar/Unzip Hadoop and java using following command

$sudo tar -xzf java-7-oracle.tar.gz
$sudo tar -xzf hadoop-1.0.0.tar.gz

3. Rename the folders to some meaning full names. Here I renamed to the folders to java and Hadoop

4. Now it's time to export environment variables and add the entries in ~/.bashrc file. ~/.bashrc file is the script every time a user logs in. bashrc file is located under home directory of the user. To do so we have to do following.

You can append following entries to the .bashrc file.

5. Now you can close the terminal and re start it again so that the bashrc changes get affected. You can run following commands to verify Java and Hadoop installation.

6. Once this done, you can configure ssh. Configuring ssh is two-step process, one is to generate keys and second is to copy public key to the authorized_keys folder. Here are the commands you need to run to do so,
$ssh-keygen -t rsa -P ""
$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

7. Once done, the next step is to configure Hadoop. All Hadoop configurations files are under $HADOOP_HOME/conf folder. Hadoop configuration requires following three file configurations
1. hadoop-env.sh - In this file we need to set the JAVA_HOME. This file already contains place holder for JAVA_HOME which is commented out so you just need to search for that uncomment the code.

2. core-site.xml - This requires configurations about the HDFS. Here we need to configure minimum two configurations viz. hadoop.tmp.dir and fs.default.name as shown below
<description>A base for other temporary directories.</description>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.

3. mapred-site.xml - This file is specific to Job Tracker settings. We should set this file as follows. <configuration>
The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.

8. Now whatever folder we have created in step 7.3, we should manually create that folder and give full rights to that folder.

9. Now format NameNode so that it creates all required folder structure as shown below.
$hadoop namenode -format

10. And the last step is to start all daemons as shown below.

11. You can verify that all daemons have started by running command


Follow us on Twitter

Recommended for you