Skip to content
Related Articles

Related Articles

Apache HIVE – Features And Limitations

View Discussion
Improve Article
Save Article
  • Last Updated : 29 Sep, 2022
View Discussion
Improve Article
Save Article

Apache hive is a data warehousing tool built on top of Hadoop and used for extracting meaningful information from data. Data warehousing is all about storing all kinds of data generated from different sources at the same location. The data is mostly available in 3 forms i.e. structured(SQL database), semi-structured(XML or JSON) and unstructured(music or video). To process the structured data available in the tabular format we use Hive on top of Hadoop. The Hive is so powerful that it can query Petabytes(PB) of data very efficiently. 

As we know MapReduce is the by default model we use for programming on Hadoop with java or some other language so Hive was mainly designed for the developers who are comfortable with SQL. After the birth of the Hive, the persons who are not very much comfortable with Java can also process data over Hadoop with the help of the Hive. Using Hive also makes it easy to query structure data because writing code in java is difficult as compared to Hive. HQL or HIVEQL is the query language that we use to work with the hive, whose syntax is very much similar to the SQL language which makes it very easy to use Hive.

Apache Hive Features   



Supported Computing EngineHive supports MapReduce, Tez, and Spark computing engine.
FrameworkHive is a stable batch-processing framework built on top of the Hadoop Distributed File system and can work as a data warehouse. 
Easy To CodeHive uses HIVE query language to query structure data which is easy to code. The 100 lines of java code we use to query a structure data can be minimized to 4 lines with HQL.  
DeclarativeHQL is a declarative language like SQL means it is non-procedural.
Structure Of Table The table, the structure is similar to the RDBMS. It also supports partitioning and bucketing.
Supported data structuresPartition, Bucket, and tables are the 3 data structures that hive supports.
Supports ETLApache hive supports ETL i.e. Extract Transform and Load. Before Hive python is used for ETL.
StorageHive supports users to access files from HDFS, Apache HBase, Amazon S3, etc.
CapableHive is capable to process very large datasets of Petabytes in size.  
Helps in processing unstructured dataWe can easily embed custom MapReduce code with Hive to process unstructured data. 
DriversJDBC/ODBC drivers are also available in Hive.
Fault ToleranceSince we store Hive data on HDFS so fault tolerance is provided by Hadoop. 
Area of usesWe can use a hive for data mining, predictive modeling, and document indexing.

Apache Hive Limitations



Does not support OLTPApache Hive doesn’t support online transaction processing (OLTP) but Online Analytical Processing(OLAP) is supported.
Doesn’t support subqueriesSubqueries are not supported.
LatencyThe latency in the apache hive query is very high.
Only non-real or cold data is supportedHive is not used for real-time data querying since it takes a while to produce a result.
Transaction processing is not supportedHQL does not support the Transaction processing feature.
My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!