Open In App

Apache Hive Installation and Configuring MySql Metastore for Hive

Improve
Improve
Like Article
Like
Save
Share
Report

Apache Hive was initially developed by Facebook in 2010. It is a Data Warehousing package, used for Data Analysis. Hive is used for Managing and querying structured data with its query language known as HQL or HIVEQL, which is quite similar to SQL query language. Hadoop provides MapReduce as a programming tool to work on Hadoop which uses Java programming language, so Facebook developers introduced Hive to make it easy for the users who are more comfortable with SQL than Java.

Features Of Hive:

  • Data is stored in HDFS
  • MapReduce code can be plugged easily
  • For Fault-Tolerance uses Hadoop
  • JDBC/ODBC drivers are also available in Hive

Metastore keeps a record of the database schemas and other related information like tables metadata, columns in a table and their data-types, etc. By default, Hive uses Metastore consist of Derby databases, which is good for testing purposes but when it comes to production, it is good to use Metastore of MySql type. One major disadvantage of using Derby databases is that here we can not access multiple instances of Hive so we use MySql for production purposes.

Pre-requisite: You must have Hadoop and Java pre-installed to install Apache Hive.

Steps to Download and Configure MySql Metastore for Hive

Step 1: Download the Hive from it’s official website by clicking on this Link and download the tar file shown in the below image which is about 143MB in size.

download-page

Once the file get downloaded in your /downloads folder extract this file with below command by going to /Downloads directory and then place it at your desired location. In my case, I am moving it to /Documents folder since my Hadoop is also installed in this directory.

tar -xvf apache-hive-2.1.1-bin.tar.gz

Step 2: Once you have downloaded the Hive now Install the MySQL java connector using the below command.

sudo apt-get install libmysql-java

Then create a link between jar file and hive lib folder and copy jar to the lib folder.

sudo ln -s /usr/share/java/mysql-connector-java.jar $HIVE_HOME/lib/mysql-connector-java.jar

Step 3: Move to /lib folder inside your apache-hive-2.1.1-bin folder and then delete the file log4j-slf4j-impl-2.4.1.jar . We have deleted this file because the same file is also present in the Hadoop folder because of which sometimes it gives an error.

Step 4: Now start the Hadoop with the below command:

start-dfs.sh
start-yarn.sh

For yarn

You can confirm all the daemons started working properly or not by using the command jps.

using jps command

Step 5: Once your Hadoop gets started we will create Directories for the hive. Implement below commands in your terminal to make directories.

hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -mkdir -p /tmp/hive

In Warehouse, we will store all data of our databases and tables. All this directory is default and already configured.

Step 6: Now we will change permission for all this directory with below command.

hdfs dfs -chmod 777 /tmp/
hdfs dfs -chmod 777 /user/hive/warehouse
hdfs dfs -chmod 777 /tmp/hive

Step 7: Now we will install MySQL with below command.

sudo apt-get install mysql-server

Step 8: Create the Metastore Database after entering your MySQL terminal, implement all the below commands to do so (use root as password for SQL).

mysql> sudo mysql -u root -p

mysql> CREATE DATABASE metastore_db;
mysql> USE metastore_db;

Change the username according to you and path also if it is different.

mysql> SOURCE /home/{user-name}/Documents/apache-hive-2.1.1-bin/scripts/metastore/upgrade/mysql/hive-schema-0.14.0.mysql.sql;

Step 9: Now make hive user and hive password with below command on mysql terminal.

mysql> CREATE USER 'hiveusr'@'%' IDENTIFIED BY 'hivepassword';
mysql> GRANT all on *.* to 'hiveusr'@localhost identified by 'hivepassword';
mysql> flush privileges;

In my case, name of my hive user is hiveusr and password is hivepassword. By default your hive user is APP and password is mine, you can change user name and password if you want.

default username and password in Hive

Then type exit to quit the MySQL terminal.

mysql> exit

Step 10: Now go to apache-hive-2.1.1-bin then go to conf folder and rename hive-default.xml.template to hive-site.xml and hive-env.sh.template to hive-env.sh

Step 11: Now we start configuration for hive so go to hive-site.xml and change the following property.(use clrl+f to search property in a file)

A: ConnectionURL




<name>javax.jdo.option.ConnectionURL</name>


B: ConnectionUserName




<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveusr</value>


// Change username in value if you change it above.

C: ConnectionPassword




<name>javax.jdo.option.ConnectionPassword</name>
<value>hivepassword</value>


// change password in value if you change it above.

D: ConnectionDriverName




<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>


Step 12: Now open hive-env.sh and append your hadoop path inside it.

export HADOOP_HOME=/home/dikshant/Documents/hadoop

Step 13: Also replace the below values in hive-site.xml (search property with ctrl+f and enter the name inside search box)

A: Replace this properties




<property>
    <name>hive.exec.local.scratchdir</name>
    <value>${system:java.io.tmpdir}/${system:user.name}</value>
    <description>Local scratch space for Hive jobs</description>
  </property>


With this property




<property>
    <name>hive.exec.local.scratchdir</name>
    <value>/tmp/${user.name}</value>
    <description>Local scratch space for Hive jobs</description>
  </property>


B: Replace this property




<property>
    <name>hive.downloaded.resources.dir</name>
    <value>${system:java.io.tmpdir}/${hive.session.id}_resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>


With these properties




<property>
  <name>hive.downloaded.resources.dir</name>
  <value>/tmp/${user.name}_resources</value>
  <description>Temporary local directory for added resources in the remote file system.</description>
</property>


Step 14: Now the most important part is to set path for Hive in our .bashrc file, so open .bashrc with below command.

sudo gedit ~/.bashrc

Copy the Hive path shown in the below image and update it according to your hive path (if different).




#Hive Path
export HIVE_HOME=/home/dikshant/Documents/apache-hive-2.1.1-bin
export PATH=$PATH:$HIVE_HOME/bin


Then source this file with the below command.

source ~/.bashrc

Step 15: Now run this below command to initialize schema for MySQL database.

schematool -initSchema -dbType mysql

Initialize schema for MySQL database in HIve Installation

Step 16: that’s it now run hive shell by typing hive in terminal.

hive

Testing Hive Command

Output of Hive Command



Last Updated : 10 Jul, 2020
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads