Open In App

Apache HIVE – Features And Limitations

Improve
Improve
Like Article
Like
Save
Share
Report

Apache hive is a data warehousing tool built on top of Hadoop and used for extracting meaningful information from data. Data warehousing is all about storing all kinds of data generated from different sources at the same location. The data is mostly available in 3 forms i.e. structured(SQL database), semi-structured(XML or JSON) and unstructured(music or video). To process the structured data available in the tabular format we use Hive on top of Hadoop. The Hive is so powerful that it can query Petabytes(PB) of data very efficiently. 

As we know MapReduce is the by default model we use for programming on Hadoop with java or some other language so Hive was mainly designed for the developers who are comfortable with SQL. After the birth of the Hive, the persons who are not very much comfortable with Java can also process data over Hadoop with the help of the Hive. Using Hive also makes it easy to query structure data because writing code in java is difficult as compared to Hive. HQL or HIVEQL is the query language that we use to work with the hive, whose syntax is very much similar to the SQL language which makes it very easy to use Hive.

Apache Hive Features   

Features

Explanation

Supported Computing Engine Hive supports MapReduce, Tez, and Spark computing engine.
Framework Hive is a stable batch-processing framework built on top of the Hadoop Distributed File system and can work as a data warehouse. 
Easy To Code Hive uses HIVE query language to query structure data which is easy to code. The 100 lines of java code we use to query a structure data can be minimized to 4 lines with HQL.  
Declarative HQL is a declarative language like SQL means it is non-procedural.
Structure Of Table  The table, the structure is similar to the RDBMS. It also supports partitioning and bucketing.
Supported data structures Partition, Bucket, and tables are the 3 data structures that hive supports.
Supports ETL Apache hive supports ETL i.e. Extract Transform and Load. Before Hive python is used for ETL.
Storage Hive supports users to access files from HDFS, Apache HBase, Amazon S3, etc.
Capable Hive is capable to process very large datasets of Petabytes in size.  
Helps in processing unstructured data We can easily embed custom MapReduce code with Hive to process unstructured data. 
Drivers JDBC/ODBC drivers are also available in Hive.
Fault Tolerance Since we store Hive data on HDFS so fault tolerance is provided by Hadoop. 
Area of uses We can use a hive for data mining, predictive modeling, and document indexing.

Apache Hive Limitations

Limitation

Explanation

Does not support OLTP Apache Hive doesn’t support online transaction processing (OLTP) but Online Analytical Processing(OLAP) is supported.
Doesn’t support subqueries Subqueries are not supported.
Latency The latency in the apache hive query is very high.
Only non-real or cold data is supported Hive is not used for real-time data querying since it takes a while to produce a result.
Transaction processing is not supported HQL does not support the Transaction processing feature.

Last Updated : 29 Sep, 2022
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads