How to Install Hadoop in Linux?

Hadoop is a framework written in Java for running applications on a large cluster of community hardware. It is similar to the Google file system. In order to install Hadoop, we need java first so first, we install java in our Ubuntu.

Step 1: Open your terminal and first check whether your system is equipped with Java or not with command

java -version

Step 2: Now it is time to update your system. Below are the 2 commands to update your system.

sudo apt-get update
sudo apt-get install update

updating Linux system

Step 3: Now we will install the default JDK for java using the following command:



sudo apt-get install default-jdk

It will ask you for Y/N press Y.

installing jdk for Hadoop

Step 4: Now check whether Java is installed or not using the command

java -version

checking for java installation

Step 5: Once it installs we require a dedicated user for the same. It is not necessary but it is a good thing to make a dedicated user for the Hadoop installation. You can use the following command:

sudo addgroup hadoop

adding a user for Hadoop - 1

sudo adduser --ingroup hadoop hadoopusr

adding a user for Hadoop - 2

Step 6: Now after running the above 2 commands, you have successfully created a dedicated user with name hadoopusr. Now it will ask for a new UNIX password so choose password according to your convenience(make sure sometimes it doesn’t show the character or number you type so please remember whatever you type). Then it will ask you for information like Full Name etc. Keep pressing enter for default then press Y for correct information.



adding user information for Hadoop Installation User

Step 7: Now use the following command:

sudo adduser hadoopusr sudo

With this command, you add your ‘hadoopusr’ to the ‘sudo’ group so that we can also make it a superuser.

making Hadoop user to superuser in Linux

Step 8: Now we also need to install ssh key’s that is secured shell.

sudo apt-get install openssh-server

installing ssh key

Step 9: Now it’s time for us to switch to new user that is hadoopusr and also enter the password you use above command for switching user:

su - hadoopusr

switching to Hadoop user

Step 10: Now it’s time to generate ssh key because Hadoop requires ssh access to manage it’s node, remote or local machine so for our single node of the setup of Hadoop we configure such that we have access to the localhost.

ssh-keygen -t rsa -P ""

After this command simple press enter.



generating ssh key for Hadoop user

Step 11: Now we use the below command because we need to add the public key of the computer to the authorized key file of the compute that you want to access with ssh keys so we fired these command.

cat $HOME/ .ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

add the public key of the computer to the authorized key file in Hadoop installation

Step 12: Now check for the local host i.e. ssh localhost with below command and press yes to continue and enter your password if it ask then type exit.

ssh localhost

testing ssh localhost - 1

testing ssh localhost - 2

Now you have completed the basic requirement for Hadoop installation.

Step 13: Now download the package that you will going to install . download it from Hadoop-2.9.0 by clicking to the file shown in below image.

downloading hadoop

Step 14: Once you have download hadoop-2.9.0.tar.gz then place this tar file to your preferred location then extract it with below commands. In my case I moved it to the /Documents folder.



extracting downloaded Hadoop File - 1

Now we extract this file with below command and enter your hadoopusr password. If you don’t know the password don’t worry you can simply switch your user and change password according to yourself.

command : sudo tar xvzf hadoop-2.9.0.tar.gz

extracting downloaded Hadoop File - 2

Step 15: Now we need to move this extracted folder to the hadoopusr user so for that type below command(make sure name of your extracted folder is hadoop):

sudo mv hadoop /usr/local/hadoop

Step 16: Now we need to change the ownership so for that command is:

sudo chown -R hadoopusr /usr/local

changing ownership in Hadoop Installation

Step 17: This is the most important Step i.e. now we are going to configure some files this is really very important.

First we configure our ./bashrc file so that to open that file type the below command:

sudo gedit ~/.bashrc

configuring ./bashrc in Hadoop Installation

Then a ./bashrc file is open then copy the below command inside this file (change java version according to your PC java version like it might be java-8-openjdk-amd64 ).



          
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

configuring ./bashrc in Hadoop Installation

Then check whether you have configured it correctly or not.

source ~/.bashrc

checking the configuring of ./bashrc in Hadoop Installation

Step 18: Before configuring more file first we ensure which version of java we have installed for that go to the location /usr/lib/jvm and after going to this location type ls command to list down the file inside it now see the java version, In my case it is java-11-openjdk-amd64.

checking java version

Step 19: Now we will configure hadoop-env.sh. For that open the file using below command.

sudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh

configuring hadoop-env.sh file

Once the file opened, copy the below export command inside it and make sure to comment the already existing export command with JAVA_HOME:

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

configuring hadoop-env.sh file

Don’t forget to save.



Step 20: Now we will configure the core-site.xml. For that open that file using below command:

sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml

configure the core-site.xml

once the file opens copy the below text inside the configuration tag

filter_none

edit
close

play_arrow

link
brightness_4
code

<!-- 
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
 -->

chevron_right


See the below image for better understanding:

configure the core-site.xml

Step 21: Now we will configure the hdfs-site.xml for that open that file using below command.

sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

configuring the hdfs-site.xml file

Once the file opens copy the below text inside the configuration tag

filter_none

edit
close

play_arrow

link
brightness_4
code

<!-- 
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>
</property>
 -->

chevron_right


See the below image for better understanding:

 configuring the hdfs-site.xml file



Step 22: Now we will configure the yarn-site.xml which is responsible for the execution of file in the Hadoop environment. For that open that file using below command:

sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml

yarn-site.xml file configuration

once the file opens copy the below text inside the configuration tag

filter_none

edit
close

play_arrow

link
brightness_4
code

<!-- 
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
 -->

chevron_right


See the below image for better understanding:

yarn-site.xml file configuration

Step 23: Now the last file to configure is mapred-site.xml. For that we have mapred-site.xml.template so we need to locate that file then copy this file to that location and then rename it.

So to locate the file we need to go to the location /usr/local/hadoop/etc/hadoop/ so to copy this file and also rename the file the single, use the following command

sudo cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

mapred-site.xml file configuration

once the file gets copied or renamed now open that file using the following command:

sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml

mapred-site.xml file configuration



And then place the below content inside its configuration tag.

filter_none

edit
close

play_arrow

link
brightness_4
code

<!-- 
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
 -->

chevron_right


See the below image for better understanding:

mapred-site.xml file configuration

Step 24: Now we have successfully configured all the files. So now it is time to check our installation. As we know that in Hadoop architecture we have name node and other blocks so we need to make one directory i.e. hadoop_space. Inside this directory we make another directory i.e. hdfs and namenode and datanode. The command to make directory is given below:

 
sudo mkdir -p /usr/local/hadoop_space
sudo mkdir -p /usr/local/hadoop_space/hdfs/namenode
sudo mkdir -p /usr/local/hadoop_space/hdfs/datanode

Now we need to give permission for that commands are below:

sudo chown -R hadoopusr /usr/local/hadoop_space

Running Hadoop

1. First, we need to format the namenode then you need to run the below command for first time when you starting the cluster if you use it again then all your metadata will get erase.

hdfs namenode -format

formatting namenode in Hadoop

2. Now we need to start the DFS i.e. Distributed File System.

start-dfs.sh

starting DFS in Hadoop



3. now the last thing you need to start is yarn

start-yarn.sh

starting yarn in Hadoop

4. Now use the following command:

jps

Now you will be able to see the SecondaryNameNode, NodeManager, ResourceManager, NameNode, jpd, and DataNode which means you will have successfully installed Hadoop.

using jps command

5. You have successfully installed hadoop on your system. Now to check all you cluster information you can use localhost:50070 in your browser. The Interface will look like as:

Hadoop Interface in Browser




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :
Practice Tags :


2


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.