Difference Between Hadoop and HBase
Hadoop: Hadoop is an open source framework from Apache that is used to store and process large datasets distributed across a cluster of servers. Four main components of Hadoop are Hadoop Distributed File System(HDFS), Yarn, MapReduce, and libraries. It involves not only large data but a mixture of structured, semi-structured, and unstructured information. Amazon, IBM, Microsoft, Cloudera, ScienceSoft, Pivotal, Hortonworks are some of the companies using Hadoop technology.
HBase: HBase is an open source database from Apache that runs on Hadoop cluster. It falls under the non-relational database management system. Three important components of HBase are HMaster, Region server, Zookeeper. CapitalOne, JPMorganchase, apple, MTB, AT& T, Lockheed Martin are some of the companies using HBase.
Below is a table of differences between Hadoop and HBase:
|1||Hadoop is a collection of software tools||HBase is a part of hadoop eco-system|
|2||Stores data sets in a distributed environment||Stores data in a column-oriented manner|
|3||Hadoop is a framework||HBase is a NOSQL database|
|4||Data are stored in form of chunks||Data are stored in form of key/value pair|
|5||Hadoop does not allow run time changes||HBase allows run time changes|
|6||File can be written only once, can be read many times||File can be read and write multiple times|
|7||Hadoop has low latency operations||HBase has high latency operations|
|8||HDFS can be accessed through MapReduce||HBase can be accessed through shell commands, Java API, REST|