Prerequisite – Introduction to Hadoop
HBase is a data model that is similar to Google’s big table. It is an open source, distributed database developed by Apache software foundation written in Java. HBase is an essential part of our Hadoop ecosystem. HBase runs on top of HDFS (Hadoop Distributed File System). It can store massive amounts of data from terabytes to petabytes. It is column oriented and horizontally scalable.
Features of HBase –
- It is linearly scalable across various nodes as well as modularly scalable, as it divided across various nodes.
- HBase provides consistent read and writes.
- It provides atomic read and write means during one read or write process, all other processes are prevented from performing any read or write operations.
- It provides easy to use Java API for client access.
- It supports Thrift and REST API for non-Java front ends which supports XML, Protobuf and binary data encoding options.
- It supports a Block Cache and Bloom Filters for real-time queries and for high volume query optimization.
- HBase provides automatic failure support between Region Servers.
- It support for exporting metrics with the Hadoop metrics subsystem to files.
- It doesn’t enforce relationship within your data.
- It is a platform for storing and retrieving data with random access.
Facebook Messenger Platform was using Apache Casandra but it shifted from Apache Cassandra to HBase in November 2010. Facebook was trying to build a scalable and robust infrastructure to handle set of services like messages, email, chat and SMS into a real time conversation so that’s why HBase is best suited for that.
RDBMS Vs HBase –
- RDBMS is mostly Row Oriented whereas HBase is Column Oriented.
- RDBMS has fixed schema but in HBase we can scale or add columns in run time also.
- RDBMS is good for structured data whereas HBase is good for semi-structured data.
- RDBMS is optimized for joins but HBase is not optimized for joins.
- Difference between RDBMS and HBase
- Difference between Hive and HBase
- Introduction to Apache Cassandra
- Categories of SQL Functions
- Query to find 2nd largest value in a column in Table
- Comparisons between Oracle vs SQL Server
- Difference between Schema and Database
- Difference between Star Schema and Snowflake Schema
- Difference between RAID 0 and RAID 1
- Difference between ROLAP and MOLAP
- Difference between Data Warehouse and Data Mart
- Difference between Relational Algebra and Relational Calculus
- Difference between Strong and Weak Entity
- Difference between Super Key and Candidate Key
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.