Difference Between Hadoop and HBase

Hadoop: Hadoop is an open source framework from Apache that is used to store and process large datasets distributed across a cluster of servers. Four main components of Hadoop are Hadoop Distributed File System(HDFS), Yarn, MapReduce, and libraries. It involves not only large data but a mixture of structured, semi-structured, and unstructured information. Amazon, IBM, Microsoft, Cloudera, ScienceSoft, Pivotal, Hortonworks are some of the companies using Hadoop technology.

HBase: HBase is an open source database from Apache that runs on Hadoop cluster.It falls under the non-relational database management system. Three important components of HBase are HMaster, Region server, Zookeeper. CapitalOne, JPMorganchase, apple, MTB, AT& T, Lockheed Martin are some of the companies using HBase.


Below is a table of differences between Hadoop and HBase:

S.No. Hadoop HBase
1 Hadoop is a collection of software tools HBase is a part of hadoop eco-system
2 Stores data sets in a distributed environment Stores data in a column-oriented manner
3 Hadoop is a framework HBase is a NOSQL database
4 Data are stored in form of chunks Data are stored in form of key/value pair
5 Hadoop does not allow run time changes HBase allows run time changes
6 File can be written only once, can be read many times File can be read and write multiple times
7 Hadoop has low latency operations HBase has high latency operations
8 HDFS can be accessed through MapReduce HBase can be accessed through shell commads, Java API, REST


