Difference Between Hadoop and Hive
Hadoop: Hadoop is a Framework or Software which was invented to manage huge data or Big Data. Hadoop is used for storing and processing large data distributed across a cluster of commodity servers. Hadoop stores the data using Hadoop distributed file system and process/query it using the Map-Reduce programming model.
Hive: Hive is an application that runs over the Hadoop framework and provides SQL like interface for processing/query the data. Hive is designed and developed by Facebook before becoming part of the Apache-Hadoop project. Hive runs its query using HQL (Hive query language). Hive is having the same structure as RDBMS and almost the same commands can be used in Hive. Hive can store the data in external tables so it’s not mandatory to used HDFS also it support file formats such as ORC, Avro files, Sequence File and Text files, etc.
Below is a table of differences between Hadoop and Hive:
|Hadoop is a framework to process/query the Big data||Hive is an SQL Based tool that builds over Hadoop to process the data.|
|Hadoop can understand Map Reduce only.||Hive process/query all the data using HQL (Hive Query Language) it’s SQL-Like Language|
|Map Reduce is an integral part of Hadoop||Hive’s query first get converted into Map Reduce than processed by Hadoop to query the data.|
|Hadoop understands SQL using Java-based Map Reduce only.||Hive works on SQL Like query|
|In Hadoop, have to write complex Map Reduce programs using Java which is not similar to traditional Java.||In Hive, earlier used traditional “Relational Database’s” commands can also be used to query the big data|
|Hadoop is meant for all types of data whether it is Structured, Unstructured or Semi-Structured.||Hive can only process/query the structured data|
|In the simple Hadoop ecosystem, the need to write complex Java programs for the same data.||Using Hive, one can process/query the data without complex programming|
|One side Hadoop frameworks need 100s line for preparing Java-based MR program||Hive can query the same data using 8 to 10 lines of HQL.|