Open In App

Difference between HBase and Cassandra

Last Updated : 08 Jun, 2022
Like Article

1. HBase: This model is used to provide random access to a large amount of structured data. It builds on the top of the Hadoop file system and is column-oriented in nature. It is used to store the data in HDFS. It is an open-source database that provides data replication. Three important components of HBase are HMaster, Region server, and Zookeeper. 

2. Cassandra: Cassandra is designed to handle a large amount of data across different commodity servers, providing high availability without any kind of failure. It has a distributed architecture that is able to handle a large amount of data. Data is placed on different machines with more than one replication factor to attain a high availability without any kind of failure. 

Difference between HBase and Cassandra:

S. No Parameters HBase Cassandra
1. Infrastructure It uses Hadoop infrastructure. Cassandra differs from Hadoop in terms of infrastructure and operation. It employs a variety of DBMS and infrastructure for a variety of applications.
2. Architecture Model It is based on Master-Slave Architecture Model. It is based on Active-Active Node Architecture Model.
3. Base of Database HBase is based on Google BigTable. Cassandra is based on Amazon DynamoDB.
4. Ordered Partitioning HBase does not support ordered partitioning. Cassandra allows for ordered partitioning. Because of this ordered division, Cassandra’s row sizes can reach tens of megabytes. 
5. Single Point of Failure (SPoF) The cluster’s accessibility depends on the availability of the Master node. All nodes are equal so no such SPoF exists.
6. Consistency HBase provides more consistency. It does not provide as much consistency as HBase provides.
7. Coprocessor HBase has the ability to use a Coprocessor.  Cassandra is not capable to support Coprocessor functionality.
  Triggers Triggers are supported because of Coprocessor capability. Triggers are not supported.
8. Inter-communication For internal node communication, HBase uses the Zookeeper protocol. Here, one node act as a master through which data is received by all other modes. For internal node communication, Cassandra uses the Gossip protocol. Data will be transferred from one node to the next. To put it another way, we duplicate the data.
9. Query Language The HBase query language is a custom-based language that must be learned.  Cassandra has its own CQL (Cassandra Query Language), which is in line with SQL language.
10. Documentation It is not as easy to learn as Cassandra. Easy to learn because of better documentation than HBase.
11. Setup Cluster HBase Cluster setup is not easy. Cluster setup of Cassandra is easier than HBase.
12. Rebalancing of Clusters HBase supports automatic rebalancing within clusters. Cassandra also supports the feature of rebalancing but not of the entire cluster.
13. Transactions

HBase provides two methods for handling the transactions- 

  • ‘Check and Put’ 
  •  ‘Read-Check-Delete’ 

Cassandra provides two methods for handling the transactions-

  • ‘Compare and Set’
  • ‘Row-level Write Isolation’
14. CAP Theorem HBase works on CP (Consistency, Partition Tolerance) Model. Cassandra works on the AP (Availability, Partition Tolerance. ) Model.
15. Security HBase permits access at the cell level. HBase works with administrators who are responsible for assigning visibility labels to data sets and then informing user groups which label they can access. Cassandra supports access at the row level. Cassandra assigns responsibilities and conditions to users.
16. Reads and Writes HBase is good at intensive reads. Cassandra is good at writing. 
17. Popular Use Cases
  • Online Log Analytics
  • Hadoop
  • Write Heavy Applications
  • MapReduce
  • Sensor Data
  • Messaging Systems
  • E-commerce Websites
  • Always-On Applications
  • Fraud Detection for Banks
18. Used by
  • Adobe
  • Xiaomi
  • Yahoo
  • eBay
  • Walmart
  • Netflix

Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads